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A minha casa, Thales e Tshabalala. 




Tenho o direito de ter raiva, de manifesta-la, de te-la como motivacao para minha briga tal qual tenho o 
direito de amar, de expressar meu amor ao mundo, de te-lo como motivacao de minha briga porque, historico, 
vivo a Historia como tempo de possibilidade nao de determinacao. Se a realidade fosse assim porque estivesse 
dito que assim teria de ser nao haveria sequer por que ter raiva. Meu direito a raiva pressupoe que, na experiencia 
historica da qual participo, o amanha nao e algo “pre-dado", mas um desafio, um problema. A minha raiva, 
minha justa ira, se funda na minha revolta em face da negapao do direito de "ser mais" inscrito na natureza 
dos seres humanos. Nao posso, por isso, cruzar os bracos fatalistamente diante da miseria, esvaziando, desta 
maneira, minha responsabilidade no discurso cfnico e "morno”, que fala da impossibilidade de mudar porque 
a realidade e mesmo assim. 0 discurso da acomodapao ou de sua defesa, o discurso da exaltacao do silencio 
imposto de que resulta a imobilidade dos silenciados, o discurso do elogio da adaptapao tomada como fado ou 
sina e um discurso negador da humanizacao de cuja responsabilidade nao podemos nos eximir. A adaptapao 
a situacoes negadoras da humanizacao so pode ser aceita como consequencia da experiencia dominadora, ou 
como exercfcio de resistencia, como tatica na luta polftica. Dou a impressao de que aceito hoje a condipao de 
silenciado para bem lutar, quando puder, contra a negapao de mim mesmo. Esta questao, a da legitimidade da 
raiva contra a docilidade fatalista diante da negapao das gentes, foi um tema que esteve implfcito em toda a 
nossa conversa naquela manha. 


E por isso tambem que nao me parece possfvel nem aceitavel a posicao ingenua ou, pior, astutamente neutra 
de quern estuda, seja o ffsico, o biologo, o sociologo, o matematico, ou o pensador da educacao. Ninguem 
pode estar no mundo, com o mundo e com os outros de forma neutra. Nao posso estar no mundo de luvas 
nas maos constatando apenas. A acomodapao em mim e apenas caminho para a inserpao, que implica decisao, 
escolha, intervencao na realidade. Ha perguntas a serem feitas insistentemente por todos nos e que nos fazem 
ver a impossibilidade de estudar por estudar. De estudar descomprometidamente como se misteriosamente de 
repente nada tivessemos que ver com o mundo, um la fora e distante mundo, alheado de nos e nos dele. 

Em favor de que estudo? Em favor de quern? Contra que estudo? Contra quern estudo? 


Mas tao decidido quanto antes na luta por uma educacao que, enquanto ato de conhecimento, nao apenas 
se centre no ensino dos conteudos mas que desafie o educando a aventurar- se no exercfcio de nao so falar da 
mudanca do mundo, mas de com ela realmente comprometer- se. Por isso e que, para mim, um dos conteudos 
essenciais de qualquer programa educativo, de sintaxe, de biologia, de ffsica, de matematica, de ciencias sociais 
e o que possibilita a discussao da natureza mutavel da realidade natural como da historica e ve homens e 
mulheres como seres nao apenas capazes de se adaptar ao mundo mas sobretudo de muda-lo. Seres curiosos, 
atuantes, falantes, criadores. 


Com a vontade enfraquecida, a resistencia fragil, a identidade posta em duvida, a auto-estima esfarrapada, 
nao se pode lutar. Desta forma, nao se luta contra a exploracao das classes dominantes como nao se luta contra 
o poder do alcool, do fumo ou da maconha. Como nao se pode lutar, por faltar coragem, vontade, rebeldia, 
se nao se tern amanha, se nao se tern esperanpa. Falta amanha aos "esfarrapados do mundo” como falta 
amanha aos subjugados pelas drogas. Por isso e que toda pratica educativa libertadora, valorizando o exercfcio 
da vontade, da decisao, da resistencia, da escolha; o papel das emocoes, dos sentimentos, dos desejos, dos 
limites; a importancia da consciencia na historia, o sentido etico da presenca humana no mundo, a compreensao 
da historia como possibilidade jamais como determinacao, e substantivamente esperancosa e, por isso mesmo, 
provocadora da esperanpa. 


Paulo Freire, trechos de Pedagogia da Indignaqao. 


A esperanpa 

Danpa na corda bamba de sombrinha 
E em cada passo dessa linha 
Pode se machucar 
Azar! 

A esperanpa equilibrista 
Sabe que o show de todo artista 
Tern que continuar 


Aldir Blanc e Joao Bosco, 0 Bebado e a Equilibrista. 
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Introduction 


Foi preciso que os filosofos e 
outros abstractos andassem ja 
meio perdidos na floresta das 
suas proprias elucubraces sobre o 
quase e o zero, que e a maneira 
plebeia de dizer o ser e o nada, 
para que o senso comum se 
apresentasse prosaicamente, de 
papel e lapis em punho, a 
demonstrar por a + b + c que 
havia questoes muito mais 
urgentes em que pensar. 

Jose Saramago, As Intermitencias 
da Morte. 


Quantum theory provides a set of rules to predict probabilities of different outcomes in 
different experimental settings. While it predicts probabilities which match, with extreme 
accuracy, the data from actually performed experiments, it has some peculiar properties 
which deviate it from how we normally think about systems which have a probabilistic 
description. Two of the “strange” characteristics are contextuality and nonlocality. 
The former tells us that we cannot think about a measurement on a quantum system 
as revealing a property which is independent of the set of measurements we chose to 
make. The later, describes how measurements made by spatially separated observers in 
a multipartite quantum system can exhibit extremely strong correlations. Contextuality 
and nonlocality are the most striking features of quantum theory. We believe that a 
complete understanding about these features may be the most important step towards 
understanding the whole theory. 

The necessity of the use of probabilities in the description of an experiment naturally 
arises when we do not control all the parameters involved in it. Our classical intuition 
leads us to think that if we could control our devices with perfect accuracy, two repetitions 
of the same procedure with exactly the same value for every possible parameter had to 
provide the same result at the end. It is natural to imagine that two replicas of the same 
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object will remain identical if they are subjected to the exactly same process. If this is 
not the case, we would have no reason to call them identical in the first place. 

Quantum theory, on the other hand, does not provide definite outcomes for the 
measurements, even if we have complete knowledge about the state of the system. If we 
have a large set of quantum systems, all prepared in the same state, we can apply the 
same measurement to all of them, obtaining a probability distribution that in general will 
exhibit dispersion. This means that for almost all measurements, at least two outcomes 
have probability larger then zero. If we apply the argument of the previous paragraph, we 
would conclude that the systems in this set could not be identical and hence they could 
not all be in the same state. Hence, the state assigned to this preparation by quantum 
theory can not be everything: there are more parameters we must use in the description 
of these systems in order to get definite outcomes for all measurements. This unknown 
parameters may have different values in our set of systems, and the probabilistic behavior 
is due to our lack of knowledge about these “hidden variables.” 

This line of thought led many physicists to believe that quantum theory might be in¬ 
complete. Hence, they conjectured the possibility of completing quantum theory, adding 
extra variables to the quantum description, in a way that with all this information (of 
quantum state plus extra variables) we would be able to predict with certainty the out¬ 
come of all measurements and in a way that when averaging over these extra variables 
we would get the quantum predictions. This kind of completion of quantum theory is 
often called a hidden-variable model. 

With some very reasonable extra assumptions on these models, we get a contradic¬ 
tion with the predictions of quantum theory. If the value associated by the model to 
a measurement is independent of what other compatible measurements are jointly per¬ 
formed, we say that the model satisfy the noncontextuaiity hypothesis. This demand is 
consistent with what we expect from classical intuition: physical quantities have prede¬ 
fined values which are only revealed by the measurement process. If these values exists 
prior to the measurement, how can they depend on some choice made at the moment 
of the measurement? 

It happens that noncontextual hidden-variable models can not reproduce quantum 
statistics. This result is known as the Bell-Kochen-Specker theorem. The result was first 
proven by Kochen and Specker, and Bell pointed out the assumption of noncontextuaiity, 
which was so natural that Kochen and Specker assumed it with no explicit discussion. 
A huge number of proofs can be found on the literature, much simpler then the pioneer 
proof. One of the most common ways to provide a simple proof of this theorem is 
using the so called noncontextuaiity inequalities. They are linear inequalities involving 
the probabilities of certain outcomes of the joint measurement of compatible observables 
that must be obeyed by any noncontextual hidden-variable model and can be violated 
by quantum theory with a particular choice of state and observables. 

One of the reasons for studying quantum contextuality and quantum nonlocality is 
the belief that they are essential for understanding quantum theory the same way we 
understand special relativity. Special relativity can be derived from two simple physical 
principles: the light speed is constant and physics is the same for reference frames in 
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uniform relative motion. We cannot do the same for quantum theory and this is one of 
the most seductive scientific challenges in recent times. The starting point is assuming 
general probabilistic theories allowing for probability distributions that are more general 
than those that arise from Kolmogorov's axioms, and even from quantum theory, and the 
goal is to find principles that pick out quantum theory from this landscape of possible 
theories. There are many ideas on how to do this, and at least three different approaches 
to the problem stand out. 

The first one consists of reconstructing quantum theory as a purely operational 
probabilistic theory that follows from some sets of axioms. Imposing a small number of 
reasonable physical principles, it is possible to prove that the only consistent probabilistic 
theory is quantum jHarOll fHarlll iMMlli CDP11 . Although really successful, this 
approach does not resolve the issue completely, specially because some of the principles 
imposed do not sound so natural. Another drawback is that there is interesting and 
important quantum effects in simple systems (as opposed to composite) that can not 
be addressed this way. 

In the second approach, instead of trying to reconstruct quantum theory, the idea is 
to understand what physical principles explain the nonlocal character of quantum theory. 
Many different principles have been proposed, the most important being non-triviality 
of communication complexity, Information Causality, Macroscopic Locality and Local 
Orthogonality |vD12l IPPK + Q9l INW09I IOW10| . None of them is known to solve the 
problem completely, but many interesting results have been found so far. 

The third approach consists of identifying principles that explain the set of quantum 
contextual correlations without restrictions imposed by a specific experimental scenario. 
The belief that identifying the physical principle responsible for quantum contextuality 
provides a higher probability of success than previous approaches is based on two obser¬ 
vations. On one hand, when focusing on quantum contextuality we are just considering 
a natural extension of quantum nonlocality which is free of certain restrictions (compos¬ 
ite systems, space-like separated tests with multiple observers, entangled states) which 
play no role in the rules of quantum theory, although they are crucial for many im¬ 
portant applications, specially in communication protocols (see, for example, references 
(Wikf! [HHHH091 lBBC + 93| and other references therein), and played an important role 
in the historical debate on whether or not quantum theory is a complete theory. On the 
other hand, it is based on the observation that, while calculating the maximum value of 
quantum correlations for nonlocality scenarios is a mathematically complex problem, cal¬ 
culating the maximum contextual value of quantum correlations for an arbitrary scenario 
is the solution of a semidefinite program [C$W14l ILov95| . The difficulties in characteriz¬ 
ing quantum nonlocal correlations are due to the mathematical difficulties associated to 
the extra constraints resulting from enforcing a particular labeling of the events in terms 
of parties, local settings, and outcomes, rather than a fundamental difficulty related to 
the principles of quantum theory. 

Within this line of research, the most promising candidate for being the fundamental 
principle of quantum contextuality is the Exclusivity principle, which can be stated as 
follows: 
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The sum of the probabilities of a set of pairwise exclusive events cannot exceed 1. 


By itself, the Exclusivity principle singles out the maximum quantum value for some 
important Bell and noncontextuality inequalities. We can get better results if we apply 
the E principle to more sophisticated scenarios. This happens because this principle 
exhibits activation effects: a distribution satisfying this principles does not necessarily 
satisfies it when combined with other distributions. Activation effects can be used to 
prove that the Exclusivity principle singles out the set of quantum distributions for the 
most simple noncontextuality inequality. It is still not known if the exclusivity principle 
solves the problem of explaining quantum contextuality completely, but many results 
have been proven that support the conjecture that it might. The main purpose of this 
thesis is to discuss in detail the situations in which the E principle can be used to rule 
out distributions outside the quantum set. 

In chapter [l] we start the discussion defining the generalized probability theories 
that are suitable for the description of states and measurements in a physical system 
[Bar07l IBW12] , We will try to keep the assumptions as general as possible, but for the 


purposes of this work it is sufficient to consider a class of theories that satisfy further 
restrictions that do not have a physical meaning and will be made solely to simplify the 
description. Nonetheless, our framework is general enough to include as special cases 
the mathematical structure of finite dimensional quantum theory and classical probability 
theory with finite sample spaces. 

In chapter[2]we discuss in detail the assumption of noncontextuality. We present two 
different approaches, both connected with graph theory: the compatibility-hypergraph 
approach and the exclusivity-graph approach [CSW14] , The graph-theoretical formula¬ 
tion of quantum contextuality supplies new tools to understand the differences between 
quantum and classical theories and also the differences between quantum theory and 
more general theories | Cabl3b lYanl31 ATC14J . 

The pioneer proof of Kochen and Specker is out of the scope of this thesis, but 
we present it in appendix [A] There the reader can find a brief discussion on the first 
attempts to prove the impossibility of hidden-variables models compatible with quantum 
theory and other interesting state-independent proofs of the Kochen-Specker theorem. 

In chapter[3]we prove the recent results supporting the conjecture that the E princi¬ 
ple might explain the set of quantum distributions in the exclusivity-graph approach to 
quantum contextuality. The most important results are the ones we have proven in refer¬ 
ence |ATCl4) . There we show that the Exclusivity principle singles out the entire set of 
quantum correlations associated to any exclusivity graph assuming the set of quantum 
correlations for the complementary graph. Moreover, for self-complementary graphs, 
the Exclusivity principle, by itself ( i.e., without further assumptions), excludes any set 
of correlations strictly larger than the quantum set. Finally, for vertex-transitive graphs, 
the Exclusivity principle singles out the maximum value for the quantum correlations 
assuming only the quantum maximum for the complementary graph. We also show that 
important results can be proven if we use graph operations other then complementa¬ 
tion and as a consequence we show that the exclusivity principle explains the quantum 
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maximum for all vertex-transitive graphs with 10 vertices, except twcQ These results 
show that the Exclusivity principle goes beyond any other proposed principle towards the 
objective of singling out quantum correlations. 

Since we made no original contribution to Bell inequalities, the concept of Bell 
scenarios will only be introduced in appendix [B} Bell scenarios provide a natural way 
to enforce the noncontextuality assumption, since in these situations the experiment is 
designed in such a way that the choice of the different compatible observables to be 
measured is made in a different region of the space in a time interval that forbids any 
signal to be sent from one region to the other. Since no signal was sent, the choice of 
what is going to be measured in one part can not disturb what happens in the other, 
what guarantees that the model is noncontextual. In this situation, we say that the 
model is local and the noncontextuality assumption is usually referred to as the locality 
assumption. 

Although nowadays we may see quantum nonlocality as a special case of quantum 
contextuality, historically the discussion of nonlocality in quantum theory preceded the 
discussion about its noncontextual character. Quantum nonlocality puzzled the famous 
trio Einstein, Podolsky, and Rosen, who discussed this strange property of quantum 
theory in their pioneer paper “Can Quantum-Mechanical description of Physical Reality 
Be Considered Complete?'' in 1935 [ EPR35 ], They started one of the greatest debates in 
foundations of physics and philosophy of science in general, that is still fruitful nowadays. 

The first one to provide a proof of the impossibility of local hidden-variable models 
was John Bell, in 1964 [Bel64] , He demonstrated that if the statistics of joint measure¬ 
ments on a pair of two qubits in the singlet state were given by a hidden-variable model, 
a linear inequality involving the corresponding probabilities should be satisfied. A simple 
choice a measurements leads to a violation of this inequality, and hence the model can 
not reproduce the quantum statistics. 

Many similar inequalities were derived since Bell’s work. Because of his pioneer paper, 
any inequality derived under the assumption of a local hidden-variable model is called 
Bell inequality. Quantum theory violates these inequalities in many situations. Besides 
the insight given in foundations of quantum theory, those violations are also connected 
to many interesting applications. 

The quest for a principle that explains the set of quantum distributions in Bell scenar¬ 
ios has been very fruitful. For completeness, a brief discussion can be found in appendix 

El 

We will state, and sometimes prove, many results that can be found in the literature. 
These results will be referred to as Theorems. The original results of the author and 
collaborators will be referred to as Propositions. We will use a huge number of tools 
from many different areas of mathematics and physics. This makes a proper introduction 
of some subjects impractical. Typically, the necessary mathematical definitions will be 
given in the text, but nor its consequences, nor other previous necessary concepts will 

1 If the E principle explains the quantum bound for one of them, the result of Yan IYanl3l proves that 
the E principle also explain the quantum bound for the other. 
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find room in the text. We list the concepts we will need, along with references where a 
proper discussion can be found. 


1. Linear algebra: vector spaces, linear maps, matrices, basis, inner products, or¬ 
thogonal complements, tensor products; Finite dimensional Hilbert spaces. An 
introduction to the the subject can be found in references [ HK61 , fLan87) : 


2. Convex Geometry: we assume that the reader is familiar with the notions of convex 
sets, convex sums, convex cones, polytopes and H-descriptions. The reader can 
learn about this subjects in references [Roc97j; 


3. Basic probability theory: finite sample spaces, cr-algebras and measures. We give 
a brief introduction in section 1.4 and suggest references fSW95 GSOl IJam04j 
for a more complete treatment. 


4. Quantum theory in finite dimension. We present the mathematical aspects in 
We recommend references |FLS65I ICTDL771 IPer95i INC00I IGri05[ 


section 


1.5 


ABT11 


5. Ordered linear spaces and order unit spaces [Jam70 . 

6 . Category theory, morphisms, opposite category, symmetric monoidal category. All 
these definition can be found in reference [Mac98_ . 


7. Sheaf theory. We define very briefly the objects we use and recommend reference 
[ MM92] for a complete treatment. 


We thank very much all who spent some of their time reading this work. Any 
comments, questions or suggestions are welcome. 


Barbara Amaral 
barbaraamaral@gmail.com 
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e/a First Chapter e\s 

Generalized Probability Theories 


In this chapter we study generalized probability theories that can be used to describe 
states and measurements in a physical system. We will not focus on any particular 
kind of system. Our intention is to discuss only the abstract mathematical structure 
behind the description and what the consequences are of assuming a particular type of 
theory. A number of requirements imposed by physical reasoning must be obeyed by all 
theories in this framework and for now we will try to keep the assumptions as general as 
possible. For the purposes of this work it is sufficient to consider a class of theories that 
satisfy further restrictions that do not have a physical meaning and will be made solely 
to simplify the description. Nonetheless, our framework is general enough to include 
as special cases the mathematical structure of finite dimensional quantum theory and 
classical probability theory with finite sample spaces, the subjects of the sections L5 and 
1.4[ respectively. In section [13] we define states and measurements in a physical system 


and in section [L2] we discuss the mathematical description of a multipartite system. A 
mathematical formalization of these concepts is presented in section [O We finish this 


chapter with general properties of the theories in section 1.6 


1.1 States and Measurements 

As we said above, our purpose in this chapter is to find a suitable mathematical structure 
that we can apply in the description of experiments carried in a hypothetical physical 
system. We follow the ideas presented by Barrett in reference [Bar07j . 

Our first assumption is about the nature of the experiments that can be performed in 
this system. We assume that there are two kinds of experiments available: preparations 
and operations. Another important requirement is that these experiments be repeatable: 
every preparation and every operation can be done as many times as we want and we 
can use several repetitions of a given procedure to count relative frequencies. For each 
operation there may be several different outcomes, each occurring with a well defined 
probability for a given preparation. Preparations can be compared through their statistics 
in relation to the given operations, and these statistics define a state. 

Definition 1. Two preparations are equivalent if they give the same probability dis¬ 
tribution for all available operations. The equivalence class of preparations is called a 
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state. 

Definition 2. A set of operations is called informationally complete or tomographic if 
the list of probabilities for the outcomes of these operations completely specifies the 
state of the system. 

For every system there is a set of tomographic operations. In the worst case scenario, 
we can take the entire set of operations as a tomographic set. This is not the case in 
general, since only a small subset of the available operations is needed to describe the 
state completely. The set of tomographic operations is not unique and we will not 
assume it to be a minimal set, in the sense that it might be the case that removing some 
operations we still get a tomographic set. This set is not always finite, but we will only 
consider the cases in which a finite tomographic set exists. 

Assumption 1. The state of the system can be completely specified by listing the 
probabilities of the outcomes of a finite set of tomographic operations each of them with 
a finite set of possible outcomes. 

This restriction is not a physical requirement and it is really easy to come up with real 
physical systems that require an infinite set of tomographic operations or tomographic 
operations with an infinite number of outcomes. We are just narrowing down the kind 
of problems we will deal with in this work. 

If we fix the set of tomographic operations each Mi with outcomes 

{1,2,..., m;}, every state can be represented by a list of probabilities: 


p (UMi) 

p{mi\M x ) 

p(l\M 2 ) 

p{m 2 \M 2 ) 

pd\M n ) 

p{m„\M n ) 


( 1 . 1 ) 


in which p{i\j) is the probability of outcome i given that the operation j was applied 
and d = X” =1 t'Ui- Since the entries represent probabilities, we have p{i\j) > 0 and 


Lpvw = 1 


for every tomographic operation j. 
normalized states with 


Nevertheless, it will be convenient to use also sub- 


Y j p{i\j) = p (1.2) 
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where 0 < p < 1 and p is independent of the tomographic operation j. The value p is 
called the norm of the state P and will be denoted by |P|. These subnormalized states 
have a physical interpretation: suppose an operation j is performed in a normalized state 
and an outcome i is obtained with probability p less than one. There is a subnormalized 
state of the form dl.2D associated with this outcome, and each entry p[k,i\l,j) = p(i\j)- 
p{k\l ) of this state corresponds to the probability of obtaining outcome i in operation j 
followed by outcome k in the tomographic operation l. 

With this interpretation, the vector with all entries equal to zero, denoted by 0 , is 
an allowed (subnormalized) state of every system. This state can be prepared in the 
following way: suppose we prepare a state for which outcome i of operation M has 
probability zero; each entry p{k\j) of the state of the system associated to this outcome 
is the probability of getting i in the first operation and k in the tomographic operation 
j, and since outcome i is a zero probability event, all the entries of this vector are zero. 


Assumption 2. For each system the set of allowed normalized states is closed and 
convex. The complete set of states dP is the convex hull of the set of allowed normalized 
states and 0 . The set is called the state space of the system. 


Definition 3. The extremal points of the state space SP are called pure states. The 
points that are not extremal are called mixed states, and can be written as a convex sum 
of pure states. Convex sums are also called mixtures. 


Definition 4. We say that a state is dispersion free if it provides definite outcomes for 
all measurements, that is, if for every measurement there is one outcome with probability 
one. 


If a model admits dispersion free states, then these states are pure. The converse is 
not always true: some models may admit pure states that are not dispersion free. This 


is the case of quantum theory, as we will see in section 1.5 


When an operation M is performed, each outcome i is associated to a transformation 
fi of the state of the system: 

P-fi(P). (1.3) 


The entry p{k\j) of fi(P) is the probability of obtaining outcome i in operation M fol¬ 
lowed by outcome k in the tomographic operation j. Operations with only one outcome 
preserve normalization. If the transformation is associated with an outcome that occurs 
with probability p < 1, then it decreases the norm of the state by a factor of p. 


Definition 5. Operations with more then one outcome are called measurements. 


Assumption 3. We require that the transformations preserve mixtures. This means 


that if 

p=LPi p i 

i 

(1.4a) 

then 

f(P) = Y J Pif(p i )- 

(1.4b) 


l 
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The physical interpretation of the vector 0 requires that 

/(o)=o. 


(1.4C) 


In fact, state vector 0 is prepared when we condition on an outcome i of a measure¬ 
ment j that happens with probability zero. Let / be associated to outcome k of some 
measurement l. Then the entry p{r\s) of/(0) is the probability of obtaining outcome 
i in the measurement j, followed by outcome k in measurement l, followed by outcome 
/• in tomographic measurement 5. Since outcome i is a zero probability event in the first 
place, all these entries are zero and equation l |1.4c| l follows. 

The conditions above imply that we can take / to be linear |Bar07 ], 

Theorem 1. The transformation f associated to an operation acting on the state of a 
physical system can be extended to a linear operation on U (l . 

Proof. Equations d 1.41 l imply that f[rP) = rf{P ) V Pcy and 0< r< 1. In fact, under 
these conditions 

f{rP) = f [rP + (1 - r)0) = rf{P ) + (1 - r)f{0) = rf{P). (1.5) 

Suppose Pey and r > 1. If rP = P' e SP, then f{rP) = rf[P) since f{P) = f{\P ') 
and by equation {L5j, f[jP r ) = jf(P')- If rP £ SP , we can extend / using the rule 

f{rP ) = rf{P). 


Let 5f+ be the set of vectors of the form rP, P e 5P, r > 0. This set is a convex cone 
and f{rP) = r/(P) V Pe 5f+ and r > 0. It is also true that 


/ \L p p j = E rtf {Pi), V Pi £ SP + , n > o. 
To prove this, let Pi = SiP'-, Si >0, P'.edT and c = Y,i r i s i- Then 


( 1 . 6 ) 


/ LnPt\ = f cL^P'i 


and since X/ c P 'i e ^ 


f 


cL r ^P[\ = cfL^P'i\ = cL r -^f lP'i ) = E rtstf [P'i) = E rtf {Pi) • 


V i 


Now we prove that equation (JTT6J is also true if the coefficients r ; - are real. Let 
Q g such that 

Q = E ?; Pi’ P i £ ^+’ h-eR. 


10 













1.1. States and Measurements 


We can rewrite the above expression as 

Q+ £ |r/|Pf= X nPi 

U< o r,->0 

and applying / to both sides of this equation we get 

/(Q)+ E ini/(^-)= E 

n<o n>o 

which implies 

/(Q)=E r i/( P ii- 

i 

This proves that / is linear in If Q belongs to the subspace spanned by 5P+, f{Q ) 
can be defined uniquely by linear extension. The action on the orthogonal complement 
of this subspace is arbitrary and we can define it to be linear. Then / can be extended 
linearly to the rest of the vector space U d . □ 

This result implies that every transformation can be written as 

f{P) = MP (1.7) 

where M is a matrix acting on U d . 

An operation is associated to a set of matrices {Mi}, each Mi corresponding to 
an outcome i of this operation. The subnormalized state associated to outcome i is 
MtP e SP and the unnormalized probability of i is |M;P|. This means that if P is 
normalized, the probability of outcome i is |M/P|. 

As one should expect, not every set of matrices {M;} corresponds to a valid operation 
on the system, since some physical requirements must be satisfied. 

Constraint 1. If a set of matrices {M,} represents an operation, the following conditions 
must hold 

1. Positivity: 0<^jjp<l, Vz\ VTe^\{0}; 

2. Normalization: —jpp = T VTg.9 9 ; 

3. State preservation: MiPeSP, MPeSP] 

4. Complete state preservation: Each transformation Mj must result in allowed states 
when it acts on a system that is a part of a larger multipartite system. 


Item [T] of constraint [l] must be satisfied because the probability of an outcome is 
a real number between zero and one. Item |2] follows from the fact that the sum of 
the probability of all outcomes must be one. Itens [3] and [4] follow from the fact that 
any transformation must take an allowed state to another allowed state, whether we 
considerer the system alone or as a part of a larger system composed of several parties. 
We will talk about item[4]again in section L2 
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Assumption 4. For each system there is a set 3~ of allowed transformations. This set 
is convex and includes the transformation that takes all P to the vector 0 . 

Definition 6. An operation is a set of allowed transformations {Mi}, Mi eST, satisfying 
constraint |T| 

The set ST can be viewed as a set of possible outcomes for the available operations, 
each outcome represented by a matrix M/ e ST. Distinct operations may share some 
outcomes, since a matrix Mi can appear in different measurements. The probability of 
a given outcome does not depend on the measurement in which it appears. 

Definition 7. The pair {S f ,2T) is called a probabilistic model. A probability theory is 
a collection of probabilistic models. 

The same model can describe different systems. This happens because the description 
of a real physical system also depends on how we connect the real experiments with the 
mathematical objects in the model. It is also possible that the same system is described 
by apparently different models. For example, we could use a different set of tomographic 
measurements and obtain a model in a different vector space and consequently, a different 
set of matrices representing allowed operations. This difference is irrelevant, since the 
physics represented by each of them is the same. 

Definition 8. Two probabilistic models [5P\,3\) and (^ 2 ,^ 2 ) are equivalent if there 
exist linear bijections 


£: SP\ — ^2 

C : 3\ — ST 2 


such that 

|MP| = |f(MK(P)| 

for every Me 3\ and every P e 

Definition 9. If two models belong to the same equivalence class under the equivalence 
above, we say that they describe the same type of system. 

All models describing a given type of system are equally good. Some of them might 
be more practical or more appropriate in a particular situation, but the choice of one 
instead of the others is just a mater of taste. 

1.1.1 Repeatability 

In the beginning of this section we mentioned that experiments must be repeatable. This 
means that every preparation and operation we consider can be done as many times as 
we want in the same conditions, what allow us to define the statistics of every sequence 

1 We do not assume that - r /\ and S?\ are subsets of the same real vector space, that is, the number of 
entries in the vectors representing the states does not have to be the same. 
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of experiments. The word repeatability will be used again with a different meaning in 
the definition of repeatability of outcomes. We apologize for the inconvenient use of the 
same word for both concepts, but we have no better option in neither case. 

Definition 10. A measurement i has repeatable outcomes if every time this measure¬ 
ment is performed and an outcome k is obtained, a subsequent measurement of i gives 
outcome k with probability one. 

In this chapter we still allow measurements with non-repeatable outcomes. In some 
cases it might be important to restrict the discussion to the case of repeatable outcomes, 
and we will do that further when we talk about contextuality. 


1.1.2 Compatibility for outcome-repeatable measurements 

One of the implications of a more general theory for computing probabilities than the 
usual classical probability theory is that in some cases there is not a well defined prob¬ 
ability for the results of all measurements in a given set. When this global probability 
distribution exists for all states, we say that the measurements are compatible. This is 
not new for the reader familiar with quantum theory, where non-compatibility is the rule, 
not the exception. 

Definition 11. A set of outcome-repeatable measurements iji,...,j n } is compatible if 
there is another measurement j with outcomes {1 and functions such 

that the possible outcomes of each j s are f s [{l,...,m}) and 

p[i\js)= E d- 8 ) 

kefpHi) 


The measurement j is called a refinement of each ji, and each ji is called a coarse 
graining of j. 


If the measurements are compatible, the probability of a set of outcomes 

j n is we ll defined and it is equal to the probability of outcomes PI kf^tik) 
for measurement j. 

The notion of compatibility is essential in quantum theory, specially in the problems 
of non-contextuality we will present in chapter [2] It is connected to the idea of “mea¬ 
surements that can be performed at once’’. If a set of measurements is compatible, they 
can be measured jointly on the same individual system without disturbing the results of 
each other. In practice, to measure all of them at the same time we apply measure¬ 
ment M in definition [II] and then use functions /j to find out the outcomes of each 
Mi. Compatible measurements can be made simultaneously or in any order and can be 
repeated any number of times in the same system and repeatability of the results must 
be preserved. We will come back to this subject many times in the text and in section 
1.5|we will see how non-compatible measurements appear in quantum theory. 
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1.2 Multipartite systems 

In this section we will see how we can describe multipartite systems in general probability 
theories. As for the simple systems, the probability theories used for composite systems 
must obey some requirements that come from natural physical assumptions. 

Assumption 5. For every system composed of several parties, we assume that opera¬ 
tions that act on only one of the parties are allowed. These operations are called local 
operations. 

Although the parties do not need to be spatially separated, this is the case most of 
the times we deal with multipartite systems. Thais motivates the use of the word local 
for the operations acting in only one party of the system. 

Assumption 6 (Local operations commute). Suppose that for each subsystem i of a 
multipartite system, an operation Mi is performed. Then the state of the composite 
system after the sequence of operations Mi does not depend on the particular order in 
which the operations were applied. 

This assumption means that local operations can be regarded as performed simultane¬ 
ously on each subsystem. This implies that for each measurement the joint probabilities 

are well defined, where r, is the outcome of measurement Mi on party i. 

An important corollary of assumption [6]is that for all composite systems no-signaling 
holds |i Bar07 ]. This property states that any of the parties cannot signal its choice of 
input to the others. Physically, this is a reasonable restriction: since there may be a large 
spatial separation between the parties, signaling between them would potentially require 
faster-then-light communication, which would violate the most fundamental principle of 
special relativity. 

Corollary 1 (No-signaling). If an operation was performed on system i, it is not possible 
to get information about which operation was performed by measuring another system 

j- 

Proof. Suppose an operation Mi was performed on system i and afterwards we apply 
operation Mj on system j. By assumption [6] the probability of getting outcome k 
for measurement Mj in this sequence of operations is equal to the probability of this 
outcome if Mj was performed first and then 

p{k\Mi,Mj) = p{k\Mj), 

which implies that p{k\Mi,Mj) does not depend on measurement Mi. This implies that 
no information on Mi can be gained by any measurement in system j. □ 

Assumption 7 (Local Tomographic Principle). The global state of a multipartite system 
can be completely determined by specifying the joint probabilities of outcomes for local 
tomographic measurements. 
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Given a system composed of n parts, it follows from the above assumption that a 
state of the system can be described by a vector with entries of the form 

p{r\, r 2 ,..., r n \Mi,M 2 ,...,M n ), 

where r* is an outcome of a tomographic measurement Mi acting only on party i. 

The normalized states of the composed system must satisfy 

Y p{r\, r 2 ,..., r n \Mi,M 2 ,...,M n ) = 1 
n,...,r n 

but, as before, we allow subnormalized states as well. The no-signaling principle implies 
that, for any bipartition {S,S C } of the set {1 the marginal distribution for the 

parties in S obtained by summing over all outcomes of the parties i e S c 

Y piri, r 2 ,■ ■ ■, r n \Mi, M 2 ,...,M n ) ( 1 . 9 ) 

rj,ieS 

does not depend on the measurements Mi with ieS. This means that marginal proba¬ 
bility distributions are well defined and this allows the definition of the reduced state of 
a subsystem i, as the vector with entries given b^ 

p(n\Mi)= Y p{ri,r 2 ,...,r n \Mi,M 2 ,...,M n ). ( 1 . 10 ) 

rj.&i 

Definition 12. For every state of a multipartite system described by joint probabilities 
of the form p{r\,r 2 ,...,r n \Mi,M 2 ,...,M n ), the marginal distribution p(r ; jM ; ) is well 
defined and is called the reduced state of party i. 


From now on, every time we refer to a multipartite system we will use only joint 
probabilities of local tomographic measurements to describe its state and for every sub¬ 
system we will use the same set of tomographic measurements to describe its reduced 


state. The connection is given by equation 1.10 


As expected, a natural constraint we will impose is that the reduced state of each 
subsystem is an allowed state of this subsystem. 


Constraint 2. Let Sf be the set of allowed states for a multipartite system and 5P l be 
the set of allowed states for a subsystem i. Let Fey, and Pi be the reduced state of 
subsystem i. We require that PiCdP 1 . 


The result below gives a connection between the vector spaces associated to the 
individual systems and the vector space associated to the composite system |Bar07j . 

Theorem 2. Let P be a state of a multipartite system and Pi be the reduced state of 
party i. If P belongs to the vector space V and each Pi belongs to the vector space V 1 , 
then 

i 


2 We can also define the reduced state of a subset S of parties in an analogous form, using equation 
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Proof. We will prove the statement above for the particular case of bipartite systems. 
Since we only consider finite dimensional systems, the general case follows if we apply 
the particular case several times. 

Let Q]j k [ be the vector with entry 1 for outcome i of tomographic measurement k 
in party 1 and outcome j for tomographic measurement l in party 2 and 0 elsewhere. 
Define the vectors and Q 2 J analogously. Notice that these vectors are not necessarily 
allowed states of the system. Nevertheless, the vectors Q]j kl generate V, the vectors 
Q l ik generate V \, the vectors Q 2 } generate V-? and 

QYj k i = Q>Q)i’ 


which implies the desired result. □ 

We can prove that any state of the composite system can be written as a linear 
combination of product states [Bar07j. 

Theorem 3. Any state of a n-partite system P can be written in the form 

P = Jl c iiPj ®P 2 i ® ( 1 - 11 ) 


where Pj is a normalized and pure state of the party j and q, e IK. 

Proof. We will once more prove the statement for n- 2, since the general case follows 
easily from this one. 

Consider a composite system consisting of parties 1 and 2 in state P e V = V 1 ® V 2 . 
By assumption [5] for each tomographic measurement Z in party 2 there is one operation 
on the composite system that corresponds to performing that measurement. Let {Mj/j 
be the set of matrices representing this operation, j labeling the possible outcomes. 

Let Pji = MjiP be the final state after outcome j and let P l . } be the corresponding 
reduced state of system 1. Then 


P = Xf>],®Q 
1J 


2 

jl 


( 1 . 12 ) 


where the vector Q 2 . was defined in the proof of theorem 2 

1 J 

To prove equation HI, 12) , let us compare the entries of P and Each entry 

of P is of the form p{i,j\k,l), which is the probability of outcome i for tomographic 
measurement k in system 1 and outcome j of tomographic measurement l in system 
2. An entry of is non-zero iff it is in position ( i,j\k,l) for some outcome i 

of tomographic measurement k in party 1. This entry is equal to the entry (i\k) of 
P l .j, which is the probability of outcome j for tomographic measurement l in system 
2 followed by outcome i for tomographic measurement k in system 1. Since local 
operations commute, equation ( |1.12) follows. 
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Let U ®W eV with U e Then equation HI. 12) implies that 

(U®W)P = 0. 

Repeating the same argument but exchanging the parties, we conclude that for any 
vector of the form U ®W with W e (SP 2 ) 1 we have 

(U®W)P = 0. 

This implies that P belongs to the subspace generated by U ®W, U e SP 1 and 
W e£P 2 . Since each SP l is generated by the states that are normalized and pure, the 
result follows. □ 

States of the form Pj®P 2 ®...®P , j ' are called product states. If a state can be written 
as a convex combination of product states, that is, if we can choose the coefficients qi in 
equation HI.11) such that 0 < qt < 1 and qt = 1, it is called a separable state. States 
that can not be written in this form are called entangled. 

Consider a composite system and a transformation T l acting in subsystem 1, repre¬ 
sented by the matrix M 1 . We know that this transformation is allowed in the composite 
system and that the resulting effect is linear. Hence there is a matrix M 1 such that the 
transformation on the composite system is given by 

= M l P. 

We want to find out what the relation is between M 1 and M 1 [ Bar07 ], 

Theorem 4. Consider a multipartite system In a state P and a local transformation M 1 
on subsystem 1 , defined by 

P 1 ~P' l =M 1 P 1 . 

The joint transformation on the composite system is given by 

P^P' = {M l ®!®...®I)P. (1.13) 


Proof. We will once more prove the statement for a bipartite system, since the general 
case follows from this one. 

Let the set of tomographic measurements of systems 1 and 2 used to write P and 
P' be fixed. Consider the following procedure: apply T l to system 1 and then the 
tomographic measurements of systems 1 and 2. The entries of the vector P' give the 
probability of each possible outcome of this procedure. By assumption [6] the order of 
the operations in systems 1 and 2 does not matter and this procedure is equivalent to: 
first apply the tomographic measurement in system 2, then apply T 1 to system 1 and 
then apply the tomographic measurement to system 1. The probabilities for the possible 
outcomes of this procedure are also given by P'. 
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The probability of outcome j for tomographic measurement l in system 2 and out¬ 
come i for tomographic measurement k in system 1, before transformation T 1 is applied, 
is given by entry Pijkl- p(i,j\k>l ) of vector P. After transformation T 1 is applied, the 
outcome j for tomographic measurement l in system 2 and outcome i for tomographic 
measurement k in system 1 is 

P', m = E M] ut P rm = [(M 1 « I)P] tl . 

i'k' 

This implies that the action of M 1 in 5P is equal to the action of M l ®I. Since the 
action of M 1 outside SP is arbitrary, we can take M 1 = M 1 ® I. 

□ 

Now that we know how the action of local operations is in the description of com¬ 
posite systems, we can go back to item [4] of constraint [l] and see how this restricts the 
allowed transformations in each subsystem. We have stated that each local transforma¬ 
tion Mi on a subsystem i must result in a allowed state of the multipartite system as 
well. This means that not only M ; - has to be an allowed transformation of system i, 
Mi® I ® ..,® I has to define an allowed transformation on the composite system. This 
extra requirement may reduce even further the set of allowed transformations in the 
individual system i. 

Definition 13. A transformation T on a system 1, represented by matrix M, is well 
defined if 

(M ® I)P 12 g 5P 12 

for all states P u e SP XZ , where system 2 can be any other system allowed by the theory. 


Constraint 3. For each system, all transformations in 3~ must be well defined. 

System 2 can itself be a multipartite system, so the general requirement of item [4] 
of constraint [l] is implied by the special case of bipartite systems of definition p~3] and 
constraint HI 

Assumption [5] together with theorem [4] imply that the allowed transformations of a 
composite system must include the ones given by equation (JTTT3J1 . 

Corollary 2. If M 1 Is an allowed transformation on system 1, then M l ®I is an allowed 
transformation of a composed system consisting of system 1 and another arbitrary system 
2 . 


We desire that our description include the possibility of multipartite systems with no 
correlation among its parties. This is quite natural: imagine that the parties of this sys¬ 
tem are thousand of kilometers apart and that none of them interacted in the past. We 
do not expect any correlation among the outcomes obtained in local measurement per¬ 
formed in these subsystems, and this implies that the joint probabilities are independent: 

p{ri,r 2 . r n \Mi,M 2 ,...,M n ) = p{n\Mi)p{r 2 \M 2 ) ...p{r„\M n ) (1.14) 
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where r ; - is the outcome of local measurement Mi on party i. 

Assumption 8. If P 1 is an allowed state of system 1 and P 2 is an allowed state of 
system 2, then P l ®P 2 is an allowed state of the system composed of parties 1 and 2. 

The state P l ®P 2 gives independent probabilities for the bipartite system, in the form 
of equation dl.l4| l. The meaning is that system 1 is in state P\, system 2 is in state 
P 2 and they are independent. Again, since system 2 can itself be a multipartite system, 
assumption [8] also implies that any vector of the form ( |1. 14) is an allowed state of the 
system composed of parties 1,2,... n in which party i is in state given by the probabilities 

p(r f |Mi). 

The next assumption is another simplification without physical meaning. We will 
include in the set ST all transformations that are mathematically well defined. There is 
no physical requirement that guarantees that this is indeed the case. For a particular kind 
of system, it is possible that nature forbids, for some reason, some of the transformations 
contained in this set. As our intention is to be general, we will define 5~ to be the largest 
set of mathematically allowed transformations. 

Definition 14. A probability theory is called maximal if the set 3~ coincides with the 
set of all mathematically well defined transformations. 

Assumption 9. All probability theories considered from now on are maximal. 

A number of corollaries follows from this assumption. The first one is something we 
would like to have in our theories: the composition of two allowed transformations is 
an allowed transformation. Mathematically, composition of transformation represented 
by matrices M and N is given by the product MN. Then, if M and N are matrices 
associated to allowed transformations of a system, we expect that MN is also an allowed 
transformation of the same system, and this is indeed the case if ST satisfy assumption 
1 

Corollary 3. !fM,NeST, thenMNeST. 

Suppose we start with system 1 in a state P\ and we append another independent 
system 2 in state P 2 . As we know, the state of the system composed of subsystems 1 
and 2 is P\®P 2 . Suppose that we apply an operation to the composite system, taking 
Pi®P 2 to another state P ', not necessarily a product state. This state gives a reduced 
state P[ that is an allowed state of system 1. This kind of procedure can be used to 
perform transformations on system 1 alone, and system 2 is just used as an ancilla that 
can be discarded after the process is completed. 

Corollary 4. A procedure consisting on appending an ancilla to system 1, performing 
a joint operation on the composed system, and then throwing the ancilla away is a well 
defined transformation on system 1. 

Physically we already have everything we need in our probabilistic theories. We can 
add some mathematical structure to our description without having to restrict it any 
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further. The reader may skip the next section with no prejudice for the understanding 
of the rest of the text. 


1.3 A little bit of Category Theory 

Previously we have defined a probabilistic model using vectors in IR rf as states and 
matrices acting in this vector space as transformations. We can provide a more formal 
and general definition. The point of view we present here is a simplification of the 
approach of Barnum and Wilce in reference [ BW1 2 . 

The first thing we need for our new definition is a ordered linear space: a real vector 
space E equipped with a closed generating cone E+. Such a cone determines a partial 
ordering, invariant under translation and under positive scalar multiplication: if a, be E 
we say that a < b iff b- a e E+. An order unit in E is an element ue E+ such that for 
every ae E there is n e N such that a < nu. We use {E, u) to denote an ordered linear 
space E with an order unit u. We say that ( E,u ) is an order-unit space. In this text, we 
will deal only with finite dimensional ordered linear spaces. In this case, E always has an 
order unit. 

Definition 15. A state on an order-unit space E is a linear functional a e E* with 
a{u) < 1. 

Once more, our definition allows subnormalized states, with the same meaning as 
before. The normalized states are the ones with a(u) = 1. The set of all states on E is 
called the state space on E and is denoted by S?{E). This set is a compact and convex 
set in E*. 

Definition 16. An effect on an order-unit space E is a non-zero element a e E with 
a < u and 0 < a(a) < 1 , M a e 5^{E). 

The set of all effects in E will be denoted by &(E). The effects in E play the role 
of the elements of ST. They represent possible outcomes of measurements that can be 
performed on the system. Each measurement is then given by a set of effects in E. 
We continue following the lines of assumption [l] and this implies that we only consider 
measurements with a finite number of outcomes. 


Definition 17. A measurement on an order-unit space E is a finite set O = {a\, < 22 ,..., a n } 
of effects at with 

CL\ + CI 2 "b • • • + Clyi — U. 


If a is a normalized state, the probability of obtaining outcome ai in measurement 
O is a{cii). Different measurements can share an outcome at, and the probability of 
obtaining this outcome is independent of the measurement in which it appears. 

Once a measurement is performed and a given outcome is obtained, the state of the 
system will change, and hence every effect is related to a transformation on SP{E), that 
has to obey restrictions already discussed in sections [IT] and 1.2 
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Definition 18. A probabilistic model is given by an order-unit space E, which deter¬ 
mines the state-space SP(E) and the set of effects £(E). 

Here we assume that 5F[E) contains all mathematically well defined states and £{E) 
contains all mathematically well defined effects. More restrictive models can be consid¬ 
ered, but we will not deal with them in this text. 

Multipartite systems can be represented using composition of models. The compo¬ 
sition will be another model, together with a way of connecting states and effects in the 
single system with some particular states and effects of the composite system. 

Let E and F be two order-unit spaces, representing systems 1 and 2 respectively. 
The composite system whose parts are 1 and 2 is represented in a order-unit space EF, 
together with a positive linear mapping 


ExF EF 

(. a,b ) >->• ab. (1.15) 


This mapping gives the connection between states and effects of E and F and EF 
we mentioned above. Its positivity implies that if a is an effect on E and b is an effect 
on F, ab is an effect on EF. A number of other requirements must be satisfied by this 
map and also by the states in EF*. All assumptions made in section [L2| will hold for 


states and effects in EF as well. Since we already provided a detailed discussion there, 
we will not repeat it here. For a different and more mathematical point of view and also 
for a discussion of the conditions we must impose in the map of equation | |1.15) , see 
reference [ BW12 . 


1.3.1 Processes and Categories 

A theory aiming to describe physical systems has to provide rules that must be obeyed 
when a system changes. We already discussed these rules when this change does not 
alter the type of system we are dealing with, but it might be the case that it does alter 
the type of the system we are trying to describe. We have then to define what are the 
valid mappings between different types of systems. These mappings are called processes. 

Definition 19. Given two order-unit spaces {E, u) and [F, v) , a process is a positive 
linear mapping 

(f> : E* —* F* 

with 0(a) [v) < a [u) for all states a in (E, u). 

A process is a map that takes states in E to states in F. If a is a normalized state, 
0(a) (y) is the probability that 0 occurs given that the initial state is a. Of course, not 
every positive linear map counts as a process. The discussion of constraint[l]applies also 
in this case with very little modification. 
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Definition 20. A process <p:E* -*F* is well defined if 

<p® I: [EG)* — CFG)* 

also takes states on EG* to states on (FG)*, for every order-unit space G, where (FG)* 
is the state space of the system composed of parties F and G, FG* is the state space of 
the system composed of parties F and G and 0®/ is the extension of 0 to the composite 
system EG (which is defined as applying 0 to system F and doing nothing in system G). 

Process must take allowed states of the system to allowed states also when the 
system under consideration is a part of a multipartite system. That is why we require 
that all processes are well defined. 

We also assume that convex combinations and composites of processes are also 
processes, for the obvious reasons. For every pair of order-unit spaces F and F there is a 
null process that takes every states a e E* to the zero vector in F*. The interpretation 
of this state is the same as before, and it can be prepared conditioning in a outcome of 
a measurement that happens with probability zero. 

We postulate the existence of a canonical trivial system I with a single operation, 
and hence with no measurement. For this system, F = F* = R. We do not have many 
options in this case, since the only normalized state is 1, which gives probability one for 
the only possible effect. 

Given an order-unit space F, there are two kinds of natural processes involving F and 
the trivial system I. The first one is a mathematical representation of the experiment that 
preparates a state. For every normalized state ae E* we define the process <p a : R — * F* 
of preparation of a given by 

1 —>■ a. 

The second kind of process is a mathematical representation of obtaining the outcome 
related to an effect in a measurement. For every effect a we define the process y/ a : 
F* —* R of registration of the outcome a, taking a to a(a). 

Definition 21. A probabilistic category is a category c <o such that 

1. Every object in ^ is a probabilistic model, including the trivial; 

2. The set of morph isms between two objects in ^ is the set of well defined processes 
between the corresponding models. 

The set of effects on a order-unit space F can be identified with a subset of ^(F, I) 
by the injection 

a ►—* y/ a : F* — ► R 

that takes each effect ae E to the corresponding registration process y/ a , and the set of 
all states on F can be identified with a subset of ^(FF) by the injection 

a >—* <p a :U — E* 
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that takes each state aeE* to the corresponding preparation process <p a . 

We must make one more imposition to the kind of categories representing probabilis¬ 
tic theories. We already know how to represent bipartite systems, via equation d!.15| l, 
but when we consider tripartite systems the composition may not be associative. This 
is not a trivial requirement, but it is a very natural one. This property implies that 
has to be a symmetric monoidal category |Mac98| . 

Definition 22. A state-complete probabilistic theory is a probabilistic category c d‘, 
equipped with a rule of composition x ^ -* ^ assigning to every pair of models 
its composition according to equation ( |1.15) , making ^ a symmetric-monoidal category. 

This kind of probabilistic theory is called state complete because every mathematically 
well defined state in E is an allowed state on the model. When we deal with real systems, 
there may be physical constraints that forbid some particular states, but we will not deal 
with this here. 

Assumption 10. We only consider state-complete probabilistic categories. 

We will see many other physical impositions we can make on the system that restricts 
the set of allowed states in chapter [3] and appendix [Cj 

1.3.2 Dual Processes 

The discussion above can be made using maps between effects instead of maps between 
states. For every process (p:E* —* F*, there is a dual process 

(p*:F-^E 

given by a[(p*{b)) = (f>{a){b) for all beF and aeE*. Physically, getting the outcome 
related to the effect <p*[b) in a measurement corresponds to apply process <p first and 
then obtain outcome b in a measurement. 

Given a probabilistic category ^ we can define the dual category c €* using the dual 
processes for r £{E,F) instead of the processes. In physicist’s language, ^ represents the 
Schrodinger picture while c €* represents the Heisenberg picture [CTDL77] , 

The most important probabilistic theories for us are finite dimensional classical and 
quantum probability theories. They will be presented in detail in sections fl~4] and 
Of course, they are not the only examples we can provide. In references [BW12 J and 
|Bar07] , the reader can find a number of examples differing from these ones. We will 
not present these examples here, but we emphasize that probabilistic theories beyond 
quantum theory are of great importance in this work. 


1.5 
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1.4 Classical Probability Theory 

Classical probability theory was developed to describe the most elementary random pro¬ 
cesses we deal with in our everyday life. The simplest example is a coin toss, where there 
are two possible outcomes. Another familiar example is the throwing of a dice: if we look 
at the top face of the die, there are six possible outcomes: the numbers {1,2,3,4,5,6}. 
Of course we can come up with much more complicated examples, but the most im¬ 
portant features are already present in these simple cases. The axiomatic system we 
will present here was introduced by the soviet mathematician Andrey Kolmogorov in the 
1930s [ SW95 GS01 , Uam04j . Although this system can be used to describe a large 
variety of random phenomena, it is not enough to describe the behavior of quantum 
systems. This leads to other axioms for probability theory and an example of such more 
general formulation is the one present in the previous sections. 

Now we study carefully classical models and we stress how the elements of the 
previous sections are represented in this class. All axioms in classical probability theory 
look very natural and it was indeed a shock to many people that nature does not always 
behave in this way. These axioms imply a number of singular properties that make this 
kind of theory different from any other in the framework. In this sense, classical theory 
emerges as a very special exception. 

1.4.1 Sample Spaces 

A classical probabilistic model consists of three basic elements. The first one is a set 
whose elements represent all possible outcomes in an experiment. This set is called the 
sample space of the experiment. 

Definition 23. The sample space D of a random experiment is a set in which every 
element co e Q is associated to a possible outcome of the experiment. 

Example 1 (The classical bit). The sample space of the game of heads and tails is a 
set with two elements, corresponding to the two possible outcomes of the experiment 
of tossing a coin. We could use the set {H,T} with the letter H representing outcome 
heads and letter T representing outcome tails. It is sometimes easier to work with sample 
spaces with numerical elements, since this allows the definition of a number of useful 
quantities we can use to get information about the experiment we are describing. In this 
case we generally use the set {0,1}, but {-1,1} is also pretty common. A classical system 
with sample space with only two elements is called a classical bit. 

Example 2. The sample space of the experiment of throwing a dice and looking at its 
superior face is the set {1,2,3,4,5,6}, as we already know. 

Example 3. Sometimes it is not that trivial to define what is the sample space of an 
experiment. Think about the possible outcomes of the following experiment: select 
randomly an inhabitant of a country and measure their height. In principle the height of 
a person is a number in the interval (0,oo), but of course we know that some values in 
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this set are highly unlikely, such as a height of a billion meters. The interval (0,3) seems 
a much more reasonable sample space. Nowadays in Brazil we could use the interval 
(0,2.37], since the tallest man we have record of, according to a quick search in Google, 
is Joelisson Fernandes, who claims to be the tallest person in Brazil with 2.37 m jWikcJ^j 
If we were in Turkey instead of Brazil we would have to use ate least the interval (0,2.51], 
since the tallest man alive in Earth is the Turkish Sultan Kosen with 2.51m [Wikg . 

The set of all subsets of O will be denoted by SA(P). We would like to assign a 
probability for all subsets of H, but in general it is not possible to do that in a reasonable 
manner. Because of this, we need the definition of measurable sets, the elements of 
&>[VL) for which we can define a probability. This is the second basic element of a 
classical probabilistic model. 

Definition 24. Zc^(fi) is a cr-algebra if it satisfies: 

1. H,0 e Z. 

2. Z is closed under complementation: If A e Z, then so is its complement, O \ A 

3. Z is closed under countable unions: If {A\,A 2 ,A 3 ,...} is a countable sequence of 
elements of Z, then A = \JiAi is in Z. 

The sets Ae Z are called measurable sets. An ordered pair (II,Z) where H is a sample 
space and Z is a cr-algebra over D. is called a measurable space. 

Example 4. The trivial cr-algebra contains only two elements: the entire set Q and is 
complement, the empty set 0. 

Example 5 (Finite and countable sample space). When D. is a finite or a countable set, 
we usually take Z = ^(O). This set is a cr-algebra even if Q is not countable, but in this 
case it might not be a good choice. For a classical bit with sample space {0,1} we have 

Z = {0,{O},{1},{O,1}}. 

For the dice, Z has 64 elements. In the finite case, if D has n elements, Z has 2” 
elements. 

Example 6 (Continuous sample space). Consider the experiment that consists of select¬ 
ing a number in the interval [0,1] with equally distributed probability. In this example, 
D = [0,1] and if we take Z to be ^([0,1]) the cr-algebra will be too big and we will not 
be able to define a probability for all subsets in it. We have to choose Z in such a way 
that it allows the definition of a probability for all its elements, respecting the natural 
properties probabilities must have, but in such a way that it is not too small to live 
behind some subsets of [0,1] for which the definition of a probability is almost obvious. 
For example, consider the subset A - [0, ^]. If we choose a point in [0,1] randomly, and 
if all points are equally likely, we expect this point to be in A one third of the time. This 

3 If you know anyone taller then Joelisson, let us know. 
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means that we should define the probability of A as 1, and hence we would like to have 
AeX. A similar argument holds for all intervals. This means that every interval should 
belong to X. Most of the times the most convenient choice is to take X as the minimal 
cr-algebra that contains all intervals. This is the Borel cr-algebra 2% and its elements are 
called Borelians. 


The third element we need in a classical probability space is the assignment of a 
probability to each measurable set AeX. We have been using this notion without 
further consideration, with the interpretation that this number quantifies the idea of 
relative frequencies of a given outcome. It is related to the ratio 

number of occurrences of A 
number of independent trials of the experiment 

This definition depends on the assumption of convergence of this sequence after many 
repetitions of the experiment. 

This ratio should not be mistaken with the most naive definition of probabilities, 
where all atomic elements of X have the same probability. Here, one is adopting the 
idea that there is some a priori probability distribution and that identically prepared 
repetitions of the experiment will generate frequencies that converge to such probability 
distribution. For a more precise statement, we have the many versions of the Law of 
Large Numbers [ Jam04j . 

Being practical, we will only focus on the mathematical definition and assume the 
existence of a real number associated to each measurable set in X, its probability. We 
assume also that this association is done in such a way that the properties expected by 
the interpretation of this number as relative frequencies in a experiment should hold. 

Definition 25. Let (D,Z) be a measurable space. A function p:X —► [R+ = IR+ u{oo} is 
called a measure if it satisfies the following properties: 


1. Non-negativity: /j(A)>0VTeX; 

2. Nullity: p[0) = 0; 


3. Countable additivity (or cr-additivity): For all countable collections {Ai,A 2 ,...} of 
pairwise disjoint sets Ai e X: 


A 


LU* 


V i 


X>ao. 


( 1 . 16 ) 


The measure /a is called a probability measure if p{ O) = 1. If p is a probability mea¬ 
sure over the measurable space (D.X), the triple (Q,X,p) is called a classical probability 
spac^\ 

4 Classical mathematicians do not need the word classical and use the term probability space for 
the triple (Q, T, p). We will add a third word to avoid confusion with the general theories introduced in 
section E3 
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Definition 26. A subset A of Q for which a probability can be assigned is called an 
event. If A - {w}, it is called an elementary event. 


It follows from definition 24 that a measure p should also satisfy, as expected, the 
properties of monotonicity and sub-additivity. 


Corollary 5 (Monotonicity). If A\ and A 2 are measurable sets with A\ c Az then 


HiAi) < p{A 2 ). 


Corollary 6 (Sub-additivity). For any countable sequence {A\,A 2 ,...} of sets Ai e Z, 
not necessarily disjoint, we have 


F 


\jA t 


V 1 


<Yf^f. 


Example 7 (The classical bit). A probability measure in the measurable space of a 
classical bit is a vector in K 2 of the form 


where p-p{ 0), l-p = p{l) and 0< 


P 

1 ~P 


Example 8 (The discrete case). In the discrete case, a probability measure p in (D, ££*(□)) 
is defined by a function p : D. — IR+ such that 

Y pica) = 1 . 

wen 

The value of p in a event AeSAiQi) is then given by equation dl. 16) 

p{A) = Y PM- 

weA 

Example 9 (The Lebesgue measure). One important measure in ([0,1], SB) is the 
Lebesgue measure l. The value of this measure in a interval [a,b\<c[ 0,1] is 

l([a,b ]) = b- a. 

This definition can be extended to all elements of the a -algebra FS in a unique manner 
Jam04] , 

We will consider only finite sample spaces, which will meet the requirement of as¬ 
sumption [l] We will always take Z = 5 a (C) for simplicity. 

Definition 27. A classical probabilistic model is a model in which every normalized 
state is a probability measure in a measurable space (0,Z). The set 3~ of allowed 
transformations is the greatest set of linear transformations in SP satisfying constraint 

m 


Notice that when we assume Z = 0° (O) the only important information is the number 
of elements in the sample space: two sample spaces with the same number of elements 
describe the same type of system. 
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1.4.2 Transformations 

An allowed transformation M eST must map a state into another allowed state, according 
to item[3]of constraint[l] This means that every element of ST is a linear map that takes 
every probability measure in Cl to another probability measure in Cl, possibly multiplied by 
a constant between zero and one, if the transformation does not preserve normalization. 
Constraint^ implies that each entry of the matrix associated to this transformation must 
be positive, and the sum of each column must be a number between zero and one. In 
the case that M preserves normalization, it is a stochastic matrix. 

There is an important class of transformations in ST, given by the indicator functions 
of elements of the o- algebra Z. Let Q = {o)\,...,o) n } and A e Z. Define I a as the n x n 
real diagonal matrix with 



This matrix is an element of ST. The matrices in ST that are of this form give rise 
to an important class of measurements, given by a partition of the sample space Cl: 
let {A\,..., A m } be a partition of Cl such that every Ai in the partition belongs to Z. 

Then the set of matrices {Ia 1 . Ia,J defines a measurement in the model. Given a 

normalized state of the system p, which is, by definition, a measure defined in (C,Z), 
the probability pt of outcome i, associated to the matrix 7^., is given by 


Pi = p{Ai). 


To prove that all the matrices mentioned above indeed belong to ST we still have to 
check that item[4]of constraint[l]is also satisfied. Indeed, one can prove that for classical 
models, all transformations satisfying items |TJ [2] and [3] automatically satisfy item [4| We 
will do it in subsection 11.4.51 

1.4.3 Classical probabilistic theory with finite sample spaces 

Definition 28. A classical probability theory is one in which all models are classical. In 
this text, the sample spaces are all finite. 

Since we are dealing with finite sample spaces, without loss of generality we can 
consider ’L = SA(Cl) in all models. With this assumption, each model in a classical theory 
is given by a sample space Cl. We can always use a tomographic set with only one 
element, the measurement associated to the partition in which every subset contains 
only one element of Cl. 

Corollary 7. If Cl = {io\,...,(o n }, the set that contains only the measurement associated 
to the partition {{iO\},{iO2},...,{0) n }} Is a tomographic measurement for the system given 
by the measurable space {Cl,2?{Cl)). 

This measurement is called maximal measurement. 

The existence of a tomographical set with only one element is a particularity of 
classical theory, with drastic consequences to our way of thinking, as we will see soon. 
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Figure 1.1: The state space of a classical bit. The point A represents the normalized 
extremal state for which pi 0) = 0 and pi 1) = 1 and point B represents the normalized 
extremal state for which pi 0) = 1 and pil) = 0. Point C represents the unnormalized 
state 6. 


Theorem 5. In a classical probability theory, the state space of the system associated 
to sample space D. is a simplex of dimension | Q |. 

Proof. Let D. = {a>i,...,a> n } and take the tomographic set that consists only of the max¬ 
imal measurement M with outcomes r\,...,r n . Define the measure pt given by 


Pi i^j) — &ij- 


Since the states in a classical model are given by probability measures in D., all of these 
n measures represent states in the state space of the system . They are also the only 
pure states in SP, since all other measures in O can be written as convex sums of the pi. 


This implies that t 
By assumption 
morphic to the n -c 


le set of normalized states is the simplex of dimension n- 1 in R”. 

2 ] SP is the convex hull of the n points pt and 0 , which is homeo- 
imensional simplex in R” +1 . □ 


Although the rc-dimensional simplex is defined as a subset of 


nn+1 


it can be repre 


sented in R”, as the convex hull of the extremal normalized states and 0 . Figures 


1.1 


and 1.2 show this for n = 2 and n = 3, respectively. 


Example 10 (The state space of a classical bit). We already saw in example [ 7 ] that the 
normalized states of a classical bit are the vectors 


P 

l ~P . 

where p = p{ 0), 1 - p = pi 1) and 0 < p < 1. The state space of this system is then given 
by the convex hull of this set of vectors and 6, which is a triangle in R 2 . This set is 
shown in figure [I3| 
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E F 



Figure 1.2: The state space of a classical trit. The state space is the tetrahedron in IR 3 
with extremal points B,E,G,H. The point B represents the normalized extremal state 
for which p[ 0) = 1 and p{ 1) = p{2) = 0, point G represents the normalized extremal state 
for which /i(0) = p{2) = 0 and ju(l) = 1 and point E represents the normalized extremal 
state for which pi 0) = p{l) = 0 and p{ 2) = 1 . Point El represents the unnormalized state 
0 . 


Example 11 (The state space of a classical trit). The normalized sates of a classical 
system with sample space {0,1,2} are vectors in K 3 of the form 


P 

q 

. i -p-q 


where p = p{ 0), q = p{ 1), 1 - p - q = p{2) and 0 < p,q,p + q < 1. The state space of 
this system is then given by the convex hull of this set of vectors and 6, which is a 
tetrahedron in [R 3 . This set is shown in figure 1.2 


The simplex has a remarkable property that every point can be written uniquely as 
a convex sum of the extremal points. The converse also holds: if in a convex set every 
point can be written uniquely as a convex sum of the extremal points, then this set is a 
simplex. For a proof of this claim, see reference [ Roc97 ], This result has an interesting 
consequence when the convex set represents the state space of a system. 


Theorem 6. If the state space of a system is a simplex, then it can be described by a 
classical probability space. 

Notice here that the only important thing in the classical probability spaces we 
consider in this text is the number of elements of O, since Z is always equal to 
This implies that once |Q| is fixed, both the state space and the set of measurements 
are determined and it makes no difference which particular symbols we use to represent 
the elements of O. 
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1.4.4 Compatibility 

In section pT7T| we defined the notion of compatibility of measurements, connected to joint 
mesurability of them. For measurements with repeatable outcomes in classical probability 
theory there are no incompatible measurements, which makes the compatibility concept 
unnecessary. This is quite easy to prove: the maximal measurement is a refinement 
for all other measurements at the same time, a consequence of the fact that a finite 
intersection of sets in a a -algebra is also an element of the cr-algebra. 

Corollary 8. In a classical system, all measurements with repeatable outcomes are 
compatible. 

One of the central aspects of the generalization presented in section |1.1| is that we 
no longer demand this property from our models. 

Incompatibility of measurements is one of the many strange features of non-classical 
theories, and specially of quantum theory. It sounds pretty disturbing that nature forbids 
us to extract all information from a system by measuring it. The existence of incompatible 
measurements has many interesting and intriguing consequences. One of them is the 
noncontextual character of some non-classical theories, which we will see in chapter [2] 

1.4.5 Multipartite systems in classical probability theory 

In classical probability theory, we require that a multipartite system can also be described 
in a classical probability space. Given the sample spaces of the individual systems, it is 
very easy to find the sample space associated to the joint system. 

Assumption 11. Given a bipartite system composed of classical parties 1 and 2, asso¬ 
ciated to sample spaces Oi and H 2 . Then the global system is associated to the sample 
space Oi x n 2 - 

By assumption |8j all product states are allowed and this implies that all measures in 
x T2 2 are allowed states of the composite system, since every measure in this sample 
space can be written as a convex sum of product states. This is a very important 
statement, and implies the following result: 

Theorem 7. Every state in a composite classical system can be written as a convex 
sum of product states. 

This is not true for every theory. In fact, in theorem [3] we have proved that all 
states can be written as a linear combination of product states, but there might be 
states for which it is not possible to find a linear combination of this type with all 
coefficients positive. This is the case for quantum theory and also for many other 
theories in framework. As a corollary of this observation, we can prove the following 
result: 

Theorem 8. In a classical model, if a linear map defined in SP satisfies positivity, nor¬ 
malization and state preservation, it automatically satisfies complete state preservation. 
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Proof. In this proof we use the notation introduced in section |1.!| Let / be a map 
satisfying positivity, normalization and state preservation. This means that / takes a 
state in SP to another state in SP. Suppose now that our system is part of a composite 
system. Let p be a state of the composite system. Since every state of the system can 
be written as a convex combination of product states, all of them are of the form 

P = 'L a iP]®P 2 i 

i 

where each pj is a state in SP , each p 2 is a state of some other arbitrary subsystem, 
0 < a,i < 1 for every i and a; = 1. Then, if we apply the map I we get 

P' = Y. a if[p))®P 2 i- 

i 

Since f(p]) is an allowed state in SP for every i, p' is also a convex combination of 
product states, and hence another valid state of the composite system. □ 

In this thesis, every time we say that a system or an experiment is classical, we mean 
that it can be described by a classical probabilistic model. We stress this fact because the 
word classical can be used in many different situations with different meanings and we do 
not want to create any confusion. In the same way, every time we say that something is 
not classical we mean that it does not admit a description through a classical probabilistic 
model. Many of the models in the framework presented in this chapter are not classical. 
One of them is the model obtained with quantum theory, which we will present in the 
next section. 
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1.5 Quantum Probability Theory 


Quantum Mechanics deals with 
nature as She is - absurd. 

Richard Feynman, |Fey88| 


Quantum theory is, at the same time, the first physical theory where the proba¬ 
bilistic character is considered intrinsic, and the first physical theory which does not 
fit into classical probabilistic models under reasonable assumptions. In this section we 
will see how states and measurements are described in this theory. For a more com¬ 
plete treatment and to applications on the description of specific physical systems, see 
[FL565I irrDL77l fPi?95l [NTOOl IGri05l lABTTT] . 

Definition 29. A quantum probabilistic model is a model in which the state space is in 
one-to-one correspondence with the set of positive operators p acting on a fixed Hilbert 
space J6 over C such that Tr(p) < 1. This set will be denoted by f). The set 
3~ (Tff) of allowed transformations is the greatest set of linear transformations satisfying 
constraint[l] These transformations correspond to a special type of linear transformations 
acting in as we will see later. 

The normalized states are the ones with Tr(p) = 1. They are called the density 
operators of J€. Once an orthonormal basis is fixed, each density operator is given by 
a positive matrix with unit trace. These matrices are called density matrices. We will 
often use the letter p to denote both density operators and density matrices and the 
specific meaning in each case must be clear from the context. The set of all density 
operators acting in J€ will be denoted by D{J6j. The set of all matrices acting on J€ 
will be denoted by We will consider only the cases with finite dimensional , 

to satisfy requirement [l} The type of system is determined by the dimension of J€. 

Theorem 9. The pure states of a quantum model are the unidimensional projectors 
over S7€. 

Proof. Clearly, the pure states are also normalized states, so we have to worry only with 
the extremal points of the set D [,T€). Every density matrix can be written in spectral 
decomposition 

P = Y,Pi\Vi)(Vi\> Pi — £pi = i, (1-17) 

i i 

where each |i pi) is a vector in J€ with unit norm. This proves that each density matrix 
can be written as a convex combination of unidimensional projectors. On the other 
hand, the unidimensional projectors |i/r)(i/r| themselves can not be written as convex 
combination of the others, because the rank of any convex combination is ate least two. 
This proves that they are the extremal points of and hence the extremal points 

of SP{je). □ 
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Every mixed state can be written as a convex combination of projectors, but in 
contrary to what happens in classical models, this decomposition is not unique. We will 


shall make this clear in example 12 


Every unidimensional projector can be associated with its one dimensional image in 
J €. We can identify this unidimensional space with a class of equivalence of unit vectors 
in J€ under the relation 


This means that every pure state is given by a straight line passing through the origin 
in J6. The set of these lines is the projective Hilbert space 2PJ6. If in some situation 
we are restricted to pure states only, we can use instead of J6 in the description 
of the model [BHQ1I lAma06| . 

It is quite common to use only a unit vector to represent a pure state in quantum 
theory. This brings no difficulty if we keep in mind that each unit vector is only a 
representative of the equivalence class related to the state and that there are many unit 
vectors representing the same pure state. 


Example 12 (The quantum bit). A quantum bit, or qubit, is the system described by 
a Hilbert space of dimension two. It is the quantum analogue of the classical bit, hence 
its name. This analogy justifies the usual notation used for the standard basis in 
{| 0 >,| 1 >}. Any pure state of this system can be represented by a unit vector in J€ 


\y/) = a\0) + p\l), a, fie C. 

The normalized pure states satisfy the further restriction |a | 2 + |/3 | 2 = 1 . 

General normalized states of a qubit are represented by 2 x 2 density matrices acting 
in . The set of 2 x 2 Hermitian matrices is a real vector space of dimension four, and 
the set of matrices given by the three Pauli matrices 



0 1 


0 -i 


1 0 

(Ti = 

1 0 

- o- 2 = 

i 0 

- cr 3 = 

0 -1 


together with the identity matrix I, is an orthogonal basis. Hence, every density matrix 
of a qubit can be written in the form 

p = ^ (/ + aa i + ba 2 + CO 3 ). 

The coefficient of I must be 1/2 because it is the only matrix with non-zero trace, equal 
to two, and Tr{p ) = 1 . The vector [a b c ], called the Bloch vector of the state, 
has to satisfy the condition 

a 2 + b 2 + c 2 < 1 

because of the positivity of 
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Figure 1.3: The Bloch sphere, a geometrical representation of the state space of one 
qubit. 


This implies that there is a bijective association between normalized states of a qubit and 
points in the ball of radius one in R 3 , the Bloch ball. This bijection preserves mixtures, 
and points in the sphere S 2 , the Bloch sphere, correspond to the pure states of the 
system. 

Including subnormalized states, the state space Sf is a cone over the Bloch ball, 
which requires four dimensions to be embedded. 

From this geometrical representation it is easy to see that the decomposition of a 
mixed state in terms of pure state is not unique. In fact, any point in the interior of the 
ball can be written as a convex combination of a finite number of points in the sphere 
in many different ways. 

The Bloch sphere is connected to an interesting mathematical object, called the Hopf 
fibration. For more information see [BIHCiTl lAma06l ITer07l lAmalO] , 


1.5.1 Multipartite systems in quantum models 

A state of a multipartite system composed of subsystems 1 and 2 in quantum probability 
theory is also given by a positive operator, p, in a Hilbert space with Tr(p) < 1. 
This Hilbert space is constructed from the Hilbert spaces of the subsystems using the 
tensor product. 

Assumption 12. If the Hilbert spaces of subsystems 1 and 2 are and respec¬ 
tively, then the Hilbert space of the composite system is given by 


12 — - X \ ® ^ 7 € 2 . 


(1.18) 
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The states of the composite systems are matrices irj^] ® ® 


Example 13 (Two quantum bits). The Hilbert space associated to the system of two 
qubits is isomorphic to C 2 ® C 2 and the density matrices of this system are positive 
matrices with trace one in M(C 2 ) ® M(C 2 ). A basis for the real vector space of 4 x 4 
Hermitian matrices is the set of matrice^]{/, ® J,oy ®cr ; }, and a density matrix 
of the system of two qubits can be written in the form 


P = 


I + E R 0i I®Oi + Y. R i 0 cri®I + Y, R ij 


i] 


where 

Rij =Tr {Oi®Oj p ). 

This matrix can also be represented by the matrix R, whose entries are the coefficients 
Rij defined above, with f?oo = 1/4. 

Unfortunately, the conditions the positivity of p imposes on the entries of R are not 
so easily written as in the case of one qubit. Sometimes we can focus on subsets of 
the set of density matrices, decreasing the number of parameters in the problem and 
simplifying the analysis [AmalOj. 

In SP i 2 ), we distinguish three kinds of density matrices. 

Definition 30. We say that a state p £ D (.^ 12 ) is a product state if 


P = Pl®P2 

with pi e i) and P 2 e We say that p is a separable state if it can be written 

as a convex combination of product states: 

P = Y.PiPi® P l 2> Pi — 0> LPi = 1 - d-19) 

i i 

with p[ e D{Ji ?i) and p l 2 £ D(JF 2 )- The density matrices that cannot be written as in 
( ]1.19D are called entangled states. 

Example 14 (Entangled states of two qubits). The simplest non-trivial composite quan¬ 
tum system is the system of two qubits. The pure separable states of this system are 
given by vectors of the form 

|T> = |WlHW>) 

where |i//y) represent states of a qubit. Hence, every pure separable state is of the form 

«ia 2 100> + aij0 2 101) + jSia 2 110> + P 1 P 2 UD. (1.20) 

5 The isomorphism we use in this identification is positive and trace preserving. For this reason, the 
density matrices of the composite system is given by a positive matrix with trace bounded by one in 

6 We will use the letter I for the identity matrix of every dimension. 
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a ; -,/3;GC, |a,| 2 + \fii\ 2 = 1. Very few pure states can be written this way. Indeed, the set 
of pure separable states is a quadric of complex dimension two in a three dimensional 
complex manifold [SRC) 1 'Ama06. Ter07 j. For example, the states 


|o±> 

\ x ¥±) 


| 00 > ± 111 ) 

V2 

| 01 >±| 10 > 

V2 


( 1 . 21 ) 


called the Bell states, can not be written in the form fll.20) , and hence represententangled 
states. 

Deciding if a mixed state is entangled or not is also easy for this system. Let 


T:M{JP) — M{je) 

p — P T (1.22) 

be the transposition map for a fized basis and T® J its extension to a composite system, 
called partial transposition. We have the following result 


Theorem 10 (Peres-Horodecki criterion [ Per96 . HHH96 J). A density matrix p of a two 
qubit system Is separable Iff Its partial transposition Is a density matrix. 


For other composite system of higher dimension, the partial transposition of every 
separable density matrix is also a density matrix, but the converse does not hold, unless 
one of the subsystem has dimension three and the other has dimension two. In these 
cases, deciding if a state is separable or not is not easy. For more information on 
separability criteria, see [NCOOI [BZ06IIHHHH09I lAmalO] and references therein. 

Entangled states are responsible for many interesting features in quantum theory and 
also play an important role in many protocols that give us strong evidence that quantum 
information is more powerful than classical information. For example, entanglement is 
the key resource in superdense coding [ BW92 . teleportation jBBC + 93] , quantum cryp¬ 
tography (see [Wikf HHHH09 ] and references therein) Deutsch’s and Shor's algorithms 
[DJ92. Sho99 ], just to cite a few examples. Not all entangled states are useful for all 
tasks: the performance of a given state depends on the degree of entanglement it pos¬ 
sesses in a very subtle way. Large amounts of entanglement are not necessarily good. 
Quantifying entanglement is then very important, but unfortunately it is a very hard 
task. There are many entanglement quantifiers, and they do not define the same pre¬ 
order in the set of density matrices of a system. The reader can find an introduction to 
entanglement quantifiers in [ NCOO , IBZ061IHHHH091 lAmalO] and references therein. 

As a corollary of assumption [ 7 ] and the no-signaling principle, given a state of a 
composite system, we can associate a reduced state to every subsystem. In a quantum 
model, each reduced state is given by a density matrix in the corresponding state space. 


Definition 31 (Reduced density matrices). Given a multipartite system composed of 
subsystems 1 and 2 in a state p i 2 , the reduced states of 1 and 2 are given by 

pi=Tr 2 (pi 2 ), p 2 = Tri(pi 2 ), 
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where Tri = Tr ® I and Tr 2 = 7® Tr, Tr: —<► C denoting the usual trace functional 

over the space of operators, are called partial traces. Matrix pt is called the reduced 
density matrix of subsystem i. 


Example 15. It was proved in example 13 that a density matrix of the system of two 
qubits can be written in the form 


P = 


I + Y^Roi I®(Ti+Y, R iO CTi®I + Y^Rij Oi®Oj 


i] 


where 

Rij =Tr {Oi®Oj p). 

This state can also be represented by the 4x4 matrix R, whose entries are the coefficients 
Rij defined above, with R 00 = 1/4. 

Using the partial trace, we find that 

[ R(n R02 R03 ] 

is the Bloch vector of the second qubit, while 


Rio R20 R30 ] 


is the Bloch vector of the first qubit. 

In this section we have discussed results related to bipartite systems, but all of them 
can be generalized to system with more parties. The state space has a much richer 
structure in those cases and finding separability criteria and entanglement quantifiers is 
even harder 


HHHH09 


1.5.2 Transformations 

The set of allowed transformations 3~ {,?€) in a quantum model corresponds to the largest 
set of linear transformations acting on the set of operators in J€ such that constraint [l] 
is satisfied. 

Let 


O: M(^f) — M{je) 

P 1 —* p\ 

be a linear map in M{H). Let us now verify what conditions are imposed on O by 
constraint [I] 

Suppose J€ is a Hilbert space of complex dimension d. The elements of M{H) can 
be written as d x d matrices and the elements of ST {Jt if) can be written as d 2 x d 2 
matrices. We will use two indices to write the components of a matrix in M{H) and four 
indices to write the components of a matrix in 5~ {J€). Then, if p' = O(p), we have 
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Pnv- 

nv "ft 


Let us see what we can say about the components of the map O. We require that 
O takes states in Sf{H) to states in 5?[H) and this implies that a number of properties 
for O must hold. The first one is that p' = <D(p) must be an Hermitian matrix for every 
state p: 



This implies that 


Pnv = 2>* P*vn = L°* Pnv (1-23) 

nv nv Hjjn nv 

which holds for all choices of p only if 


® =0* . (1.24) 

mu urn 

nV \ n 

The second condition we have to impose is that Tr(p') < 1. Then 

ram = EE < ^ ) Pnv — 1- (1.25) 

YY] Yff 

m m nv 'JJy 

Let {|1),..., |d)} be an orthonormal basis for J€ and define p n = \n) (n\, 1 <n< d. Using 
p = p n in equation ( |1.25) , we conclude that 

<1. (1.26) 

rrt mm 
nn 

Using p as the matrix with all components equal to zero except p nn , p v ,v,Pnv,Pvn, that 
are all equal, we conclude also that 

+ 7 ’$ +E° - 2 - a.27) 

ni mm ni mm ni mm ni mm 

ni yiyi r,L nv rn vn rn vv 

When O preserves the norm of the states, the same calculation show that 

E^ram = EE^ Pnv = 1- (1-28) 

YY1 YY1 

m m nv 

Using p as the matrix with p nn = p vv , p nv = p* n = ip nn and all other entries equal to 
zero, we get one extra constraint that implies the foolowing condition 


X> =S nv . (1.29) 

yyj mm 

ru nv 

The elements of 3~ {J6) that preserve the norm of the states are called trace pre¬ 
serving maps. The maps that do not increase the norm of some states are called trace 
non-increasing maps. 

The next requirement we impose is that if p is a positive matrix, then O(p) must 
also be positive. 
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Definition 32. A map ®—» MG^f) is called positive if the image of a positive 
matrix under O is also a positive matrix. 

Every map OeT [JA) is a positive map. The converse does not hold, as we will see 
in a moment. 

To help in the characterization of positive maps, we define the dynamical matrix of 
O as the d 2 x d 2 matrix with entries 


D =0 . (1.30) 

mn mu 

juv nv 


When O is an Hermitian map, its dynamical matrix is also Hermitian. When O is 
trace non-increasing we have 


L D 

n 

m " 

Y D +yd +Y d +L d 

, J7 mn , J7 mn f77 mv ni n 
" l mn mv rn mn ,n n 

and when O is trace preserving we have 

Yd = d nv . 


< i 

< 2 


(1.31) 


(1.32) 


Let us see now what are the consequences of the positivity of O in the dynamical 
matrix D. Suppose p is a pure state. Then p = \<p)(<p\ and p m ^ = When ® is 

positive, p' is positive and then, for all \\y)eJ6 > 


0<{if/\p , \y/) = Yv / mP , m^y / ^= E Vm^nD y/rf* = {(f>*\(y/\D\y/)\(p*). 

m/j m^inv 

Then, if O is a positive map, {([>* \ {y/\D\y/)\(p*) > 0 for all \<p), \y/) e . 
Definition 33. A d 2 * d 2 matrix D is called block positive if 


((f>*\(V / \D\ys)\<p*) >0 V \<p),\y/) e J6. 

Then, if O is a positive map, D is a block-positive matrix. This condition is also 
sufficient. 

Theorem 11 (Jamiofkowski). A linear map O : M(,76 J ) —► M(.7A) is positive iff its 
dynamical matrix is block positive. 

The proof of this result can be found in references [BZ06I tAmalO] , 

As we already discussed previously, the condition that O takes states to states in 
SAIJA) is not sufficient to consider O as an allowed transformation. The constraint of 
complete state preservation requires that this must also happen when the system is part 
of a multipartite system. 
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Definition 34. Let O be a positive map acting on M{,?€). Let d7€' be any other vector 
space of dimension k and 7 be the identity map acting on M[d7A'). If the map <5®/, 
acting on is positive, we say that O is k-positive. If O is fc-positive for 

every keN, we say that O is completely positive. 

We have seen that in classical theories every state preserving transformation is auto¬ 
matically completely state preserving. This is a consequence of the fact that every state 
is written as convex combination of product states. The existence of entangled states 
in quantum theory implies, among many other interesting things, that there are many 
state preserving maps, namely, the trace non-increasing positive maps acting in M(T5f), 
that are not completely state preserving. 


Example 16. Not every positive map is completely positive. For example, consider the 
transposition map T acting on the sate space of one qubit. This map is positive, but 


r®7(|'F_X'F_|) 


r 

0 

0 

0 

0 

A 


0 

0 

0 

-1 

1 

0 

1 

-1 

0 


1 

0 

1 

0 

0 

2 

0 

-1 

1 

0 


“ 2 

0 

0 

1 

0 


0 

0 

0 

0 

; 


-1 

0 

0 

0 


which is not positive. 


If O belongs to 3~{J€), condition [4] implies that O® 7 also takes states to states 
in for every Hilbert space . This means that O must be a completely 

positive map. 


Theorem 12. The set of allowed transformations ST(,77) is the set of trace non¬ 
increasing completely-positive maps acting on M{,77). 


We can also use the dynamical matrix D to find necessary and sufficient conditions 
for the complete positivity of O. 

Theorem 13 (Choi). A map O acting on M{,77) is completely positive iff the corre¬ 
sponding dynamical matrix D is positive. 

Using this theorem it is possible to prove that completely positive maps can be written 
in a simple way using the Kraus representation. 

Theorem 14 (Kraus representation). A linear map O is completely positive iff it can 
be written in the form 

p — p' = Y. A iP A \’ 

i 

where each A{ is a square matrix of the same size of p. Furthermore, O is trace preserving 
iff the matrices A; satisfy 

y A a _ t 

I—, z 


For proofs of these results, see 


BZ06: lAmalOj . 
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1.5.3 Measurements 

By definition [9j measurements in quantum models are given by a set of trace non¬ 
increasing completely positive maps {®i, 02 ,..., 0 „} such that 

X>[0;(p)]=Tr(p) (1.33) 


for every p e D . 

There are two important special cases: POVM's and projective measurements. 

Definition 35. A positive-operator valued measurement (POVM) is a measurement 
{<E>i,<I> 2 ,in which each transformation O is given by 

O i(p)=Mi P M\ (1.34a) 

where the Mi are matrices in M(^5f) satisfying 

= (1.34b) 

i 

The probability of outcome i for the state p is 

Pi = Tr{MjMip), (1.34c) 

and the unnormalized state after outcome i is 

Pi = MipM\. (1.34d) 


A POVM is defined if we give a set of matrices {M\,M 2 ,...,M n 
|L34bl 


satisfying equation 


Theorem [14] implies that every measurement in quantum mechanics is the 
coarse graining of a POVM. 


Definition 36. A measurement {0i,0 2 ,...,0„} is called projective if it is a POVM in 
which the matrices Mi are projectors acting on . If every Mi is a unidimensional 
projector, the measurement is called a complete projective measurement. 


A projective measurement is defined if we give a set of projectors {Pi,P 2 ,...,P n } 
satisfying 

LPt = i- 

i 

This implies that the Pj are orthogonal projectors. 

Projective measurements are the ones satisfying outcome repeatability. A curious 
feature of quantum theory is that, contrary to classical theory, even when we restrict the 
measurements to outcome repeatable measurements, the pure states are not dispersion 
free states. Indeed, given a projective measurement {Pi,P 2 ,...,P n }, a pure state |i/r)(i/r| 
gives outcome i with probability one iff 


Pi k) = k) 
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and this happens iff | y/) belongs to the subspace in which Pi projects. Of course, most 
of the pure states do not satisfy this property, and hence there are different outcomes 
with non-zero probability. Nevertheless, there is a difference in the behavior of pure and 
mixed states when it comes to outcome definiteness. 

Theorem 15. The density matrix p represents a pure state if, and only if there is a 
complete projective measurement with probability pi = 1 for some outcome i. 

Proof Let p= |i/r)(t/r|. Take a complete projective measurement such that outcome i 
is associated to the one-dimensional projector Pj = |i/r)(i/r|. Then we have that pi = 1. 
Suppose now that 

P = Y, A iWj)(V'j I 

i 

is a mixed state and {P±,P 2 ,...,P n } is a complete projective measurement. This means 
that Pi = \<pi)(<pi\ and that {| (pj)} is an orthonormal basis for . If the probability 
of outcome i is pt = 1 for the state p, | ((pj | y/j) \ = 1 for every j, which means that 
p= | (f>i ) iypi | is a pure state, a contradiction. □ 

1.5.4 Compatibility of projective measurements 

Compatibility of two outcome-repeatable measurements can be easily decided in quantum 
models from the matrices defining the measurements. 

Theorem 16. Two projective measurements {Pi,P 2 ,...,P n } and {Qi,Q 2 ,---,Qm\ are 
compatible iff Pj and Qj commute for every 1 < i < n and 1 < j <m. 

Proof. The measurements are compatible if they are both coarse grainings of the same 
complete projective measurement. This happens iff all Pi and Qj are simultaneously 
diagonalized, and hence, iff they commute. □ 

1.5.5 Expectation value of a measurement 

In classical probability theory, the concept of random variable, a real-valued function de¬ 
fined on the sample space C2, is a useful tool that allows the definition of many important 
quantities such as expectation values and variances. Something similar can be done in 
generalized probabilistic theories. We simply label the outcomes of a measurement by 
real numbers, and then we are able to define the same quantities, related to the value 
of each outcome and the corresponding probabilities. 

Definition 37. The expectation value of a measurement M with outcomes a,- e R in a 
state p is 

< M) = Y j a i p i (1.35) 

i 

where p ; - is the probability of obtaining a; when measurement M is applied on state p. 
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For projective measurement in a quantum model, each outcome a, is associated to 
a projector Pi and the probability pt is given by 

Pi = Tr (Pip) (1.36) 

where p is the operator corresponding to the state of the system. Hence, the expectation 
value of a projective measurement P can be easily calculated 

< P ) = Y J a iPi = Y. a i Tr { P ip) 

i i 

and by the linearity of the trace 


( P > = Tr 


Y. a i p iP 


Tr (Op) 


(1.37) 


where O = Y.i a iPi is an Hermitian operator with eigenvalues a*. The eigenspace asso¬ 
ciated to at is the subspace in which Pj projects. This operator is called the observable 
associated to the measurement. This proves the following 


Theorem 17. The expectation value of an observable O, associated to a projective 
measurement P, for a given state is 


( P > = Tr(Op), 


(1.38) 


where p is the density operator associated to the state. 


When the state is pure, p= |i/r)(i/r| and equation l |1.38| l reduces to 


< P ) = (y/\0\y/). 


1.5.6 Processes 

The same results we presented above for transformations in 5~ [J€] can be proven for 
processes, maps that change the type of system under consideration. The processes 
must also obey physical requirements similar to the ones imposed to the elements of 
3~ Let and be two Hilbert space, not necessarily of the same dimension 
and let 

A: MitTT i) —► MfTT'i) 

be a linear map. The definitions of positive, k-positivity and completely positivity can 
be generalized to this kind of map. 

Definition 38. A map <E>: M(=^fj) —* M(^f 2 ) is called positive if O(p) is positive for 
every positive peM[JT )). If 

<& ® /: i ® je') — M{je 2 ® j?) 

is positive, where is a Hilbert space of dimension k, O is a k-positive map. ® is 
called completely positive if it is a fc-positive map for every k. 
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1.5. Quantum Probability Theory 


When the bases of and ^f 2 are fixed, we can represent the map O by a matrix, 
which we will also denote by O. Once more, since O acts in i), the entries of the 
corresponding matrix will carry four indices. The action of O in a density matrix p e 
is a density matrix p'e whose entries are given by 

Pjuv- 

MV 

We can also define the dynamical matrix D associated to the process O 

D =d> . 

mn mu 

jUV nv 

If and T$? 2 do not have the same dimension, the matrix of O is not a square matrix 
but the associated dynamical matrix D is. If dim(^fi) = k and dimQ3? 2 ) = /, then the 
matrix of <J> is a k 2 x l 2 matrix, while D is a square matrix of size kl x kl. The version 
of Jamiofkowski’s and Choi’s theorems for processes can also be proven. 

Theorem 18. A linear map O: M{,77\) —► M{77t f 2 ) is positive iff the associated dynam¬ 
ical matrix D is block-positive. It is completely positive iff D is positive. 

The dynamical matrix can be writen in terms of the action of A® I: M{77€\ ®^f 2 ) —*• 
MfTA'i ® rTA'i) in the state P + = |0+)(0 + | e M{,7 A a ® where 

i°+>=4d”>’ 

d i 


d being the dimension of J€\. 

Theorem 19 (Choi-Jamiofkowski’s Isomorphism). Given a linear map A : i) 

Da = A® /(IO+XO+1). 

A proof of this result can be found in references [BZ06 Ama.IOj. 
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1. Generalized Probability Theories 


1.6 Final Remarks 


In this section we will discuss briefly general properties that follow from the assumptions 
we have made about the structure of general probability theories. A number of properties 
are satisfied by all of them but others are present only in specific kinds of models. 
Classical probability theory, for example, has a number of characteristics that distinguish 
it from all others. Some properties thought as special features of quantum theory are 
in fact general, and in many aspects it is classical probability theory that emerges as a 
very particular case. In this sense, many of these properties can be seen as a signature 
of the “non-classicality” of the theory, rather than a signature of its “quantumness”. For 
more detailed discussion and for the proofs of the results presented below, see reference 
BarOTj . 


The first one, that we already mentioned, is the fact that classical theory is the only 
one in which every mixed state can be decomposed uniquely as a convex combination of 
pure states. This is due to the fact that the state space of a classical model is a simplex, 
and this is the only convex body with this property. 

Another interesting property of classical theories is the effect of an outcome-repeatable 
measurement in the system. The definition of measurement we gave includes a trans¬ 
formation of the state of the system. Note that this fact by itself should not create 
any panic, since even in classical probability theory the state of the system can change 
after a measurement. What is special about quantum theory is that pure states can 
change after a measurement, whereas in classical probability theory only mixed states 
can change, as we saw in section L4 This is not the case for most theories in this 
framework. The same questions of interpretation of the change of the state after a 
measurement that bother quantum theory for so many years may show up once again. 
We will not jump into the quicksand of philosophical debate here and we will assume 
a clear practical position when it comes to interpretation of our assumptions and their 
consequences. Nevertheless we mention that there is room for a lot of different points 
of view in this subject and that the reader should feel free to think about it as much as 
(s)he wants [ ER13 ], 

In theorem [3] we proved that any state of a composite system can be written as a 
linear combination of product states. This does not imply, and we also did not assume, 
that every state can be written as a convex combination of product states. States 
with this property are called separable, and the states that are not separable are called 
entangled. As we saw in section 1.5| in some models there may be entangled states. 
Entangled states are closely related to an interesting feature of quantum theory called 
nonlocality, that we will define properly in appendix [Bj although they are not always 
equivalent [ VB14 . lBCP + 13j . Classical probability theories do not allow entangled states 
and do not exhibit nonlocality, but quantum theory and many other theories do. 

Another feature of all classical theories is that they are the only ones allowing cloning 
of an arbitrary pure state. A probabilistic cloning procedure is given by the following 
steps: begin with a system in a pure state p; introduce an ancilla system of the same 
type, prepared in a fixed pure state po; apply a joint transformation on the pair of 


46 


















1.6. Final Remarks 


systems such that the final state is 


with probability larger than zero. 


pxp 


Theorem 20. If in a given probability theory there is a probabilistic cloning procedure 
to every model, then the theory is classical. 


The proof of this result can be found in reference [Bar07 


We can recognize many properties exclusive of classical theories. This allows us to 
arrive in this kind of theory if we make all the assumptions done in section [ 13 ] and 1.2 
and postulate also any one of this properties that single out classical theories among the 
other ones in this framework. The main question motivating this work is if we can do 
the same for quantum theory: is there any physical principle that singles out quantum 
theory in the universe of all generalized probability theories? What different ways are 
there of uniquely identifying quantum theory from the other theories in the framework 
by adding as few extra assumptions as possible? 

The features connected to the quantum character of the theories are still not com¬ 
pletely understood, but we believe that the study of quantum contextuality is shedding 
light upon this quest. 
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<•/» Second Chapter 

Non-contextuality inequalities 


Quantum theory has an intrinsic statistical character. It does not provide the exact 
value of all measurements for any state of the system, but rather the probabilities of the 
occurrence of each possible outcome, even when the state of the system is pure. We 
have seen in section 1.5 that the expectation value of a projective measurement P in a 
state p is given by 

<P> = Tr (Op) (2.1) 


where O is the observable associated to the measurement. We have seen also that there 
is no dispersion for P iff the support of p is contained in an eigenspace of O. This means 
that in general, there is a statistical distribution for the outcomes of P, even if the state 
of the system is of the form |i/r)(i/r|. In this chapter we want to discuss this probabilistic 
character of quantum theory, focusing only in outcome-repeatable measurements, which 
means that we will work with projective measurements from now on. 

Consider a set with a huge number of copies of the same system, all prepared in the 
same way. Such a set will be called an ensemble. To calculate the probability distribution 
of a given measurement for this preparation one can perform this measurement in several 
copies, and count the relative frequencies of each outcome. For most measurements, 
this distribution has dispersion larger then zero. Two possible explanations for this 
indeterminacy on the outcomes of the measurements are a priori conceivable: 


I. The individual systems of the ensemble are in different states, in such a way that we 
could separate the copies in a number of sub-ensembles, each of them consisting in 
a definite state that is dispersion-free for all the measurements. The probabilistic 
character of the experiments is, in this case, explained by our lack of information: 
we do not know everything about the system we are measuring and hence we can 
not predict the results. 

II. All individual systems are in the same pure state and that is all the information we 
can get. The laws of nature allow that different outcomes are possible even when 
we perform the same measurement in two identically prepared systems. 


In this chapter we present a number of attempts to find objective criteria which allow 
us to decide between these two options. We will see that, under some very reasonable 
circumstances, there is no way out but to accept option II. 
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2. Non-contextuality inequalities 


Before we enter the specific details of the proofs of the impossibility of option I, let 
us think about why option I seems so logical to our classical minds, modeled by our daily 
experience with macroscopic systems. The necessity of the use of probabilities in the 
description of an experiment naturally arises from the incompleteness of our knowledge 
about the parameters involved in it. Due to our classical intuition, we are used to think 
that if we knew everything about our experiment, two repetitions of the same procedure 
with exactly the same value for every possible parameter involved had to provide the 
same result at the end. It is reasonable to imagine that two replicas of the same object 
will remain identical if they are subjected to the exactly same process. If this is not the 
case, we would have no reason to call them identical in the first place. 

Let us focus now in quantum theory and apply this reasoning to an ensemble of 
systems in the same state |i/r)(i//|. Since this ensemble will exhibit dispersion for most 
measurements, the elements of the ensemble could not be identical and hence they could 
not all be in the same state. Hence, the state assigned to this preparation by quantum 
theory can not be everything: there are more parameters we must use in the description 
of these systems in order to get dispersion-free states. This unknown parameters may 
have different values in our ensemble, and the probabilistic behavior is due to our lack 
of knowledge on these “hidden variables.” 

This line of thought lead many physicists to believe that quantum theory might be 
wrong, or at least, incomplete. Since quantum theory is capable of reproducing every 
experimental data people could get in the laboratory up to these days, we have absolutely 
no evidence that it might be wrong. Hence, our best shot is to suppose the possibility 
of completing quantum theory, adding extra variables to the description of pure states, 
in a way that with all this information (of pure quantum state plus extra variables) we 
would be able to predict with certainty the outcome of all measurements and in a way 
that when averaging over these extra variables we would get the quantum predictions. 
This kind of completion of quantum theory is often called a hidden-variable model. 

A good example in which a similar argument applies is classical thermodynamics, 
which states physical laws involving macroscopic aspects of matter, such as pressure, 
volume and temperature. These laws do not provide all the information about the 
systems studied, since they appear when we average over a large number of atoms and 
we do not take into account the individual parameter such as position and velocity 
of each atom. Although very useful for many applications, classical thermodynamics 
does not explain phenomena such as Brownian motion, which require a more complete 
treatment, provided by statistical physics. 

It happens that under the assumption of noncontextuaiity, hidden-variable mod¬ 
els compatible with quantum theory are not possible. This result is known as the 
Bell-Kochen-Specker theorem. The noncontextuaiity hypothesis states that the value 
assigned by the model to a measurement can not depend on other compatible measure¬ 
ments performed jointly. 

The first proof of this result was provided by Kochen and Specker [ KS67] , It is 
based on a set of 117 observables with possible outcomes 0 or 1. This set is constructed 
in such a way that if we assign one of this values to each of them noncontextualy, 
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we reach a contradiction with what we expect from quantum theory. The assumption 
of noncontextuality was so natural that it was only pointed out after by Bell [Bel66 . 
Many other proofs using the same idea have been provided, using sets with a smaller 
number of observables. They have an important common feature: they are all state- 
independent. This means that if we choose the set of observables as in any of these 
proofs, the assignment of definite values for the corresponding projective measurements 
can not reproduce the statistics given by any quantum state when we average over all 
possible values of the hidden variables. The reader interested in such proofs may find a 
number of examples in appendix [A] 

It is possible to provide simpler state-dependent proofs of the impossibility of hidden 
variables compatible with quantum theory. The idea behind this kind of proof is to show 
that no hidden-variable model can reproduce the statistics of some measurements for a 
given state of the corresponding system. Some of this proofs use a very small number 
of vectors and hence are much simpler than the state-independent ones. 


One of the most common ways to provide a state-dependent proof of the Kochen- 
Specker theorem is using the so called noncontextuality inequalities. They are linear 
inequalities involving the probabilities of certain outcomes of the joint measurement of 
compatible observables that must be obeyed by any hidden-variable model and can be 
violated by quantum theory with a particular choice of state and observables. In this 
chapter we study noncontextuality inequalities and some different ways to approach the 
subject. 


One advantage of the impossibility proofs using noncontextuality inequalities is that 
many of them use a small number of observables, which may make them much more 
suitable for experimental implementations. The experimental verification of quantum 
violations was already performed for a number of inequalities, specially in the particular 
case of Bell inequalities, which are introduced in appendix[B} 


Here we discuss two approaches to noncontextuality inequalities: the compatibility 
hypergraph approach, in section 2/2 and the Exclusivity graph approach, in section 2.9| 
In section 2j]we discuss the assumption of noncontextuality. In section 
the connection between the first approach and Sheaf theory. In section 


23 

2.4 


we explain 
we discuss 

the probability distributions obtained with classical and quantum theories. In section [275] 
we define noncontextuality inequalities. The important examples of the KCBS inequality 
and the n-cycle inequalities are discussed in sections Z6 and 2.7| respectively. In section 


2.8 we introduce the exclusivity graph, which is an important tool for both approaches. In 


section 2.10 we define noncontextuality in the second approach and review the examples 
given before in this new perspective. The graph theoretical formulation of quantum 
contextuality supplies new tools to understand the differences between quantum and 
classical theories. In section 12.111 we use some of these tools to find the scenarios 
exhibiting the largest quantum contextuality. We close the chapter with some final 
remarks. 
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2. Non-contextuality inequalities 


2.1 The assumption of noncontextuality 

Let {0i,0 2 ,-..,0 m } be a set of compatible measurements. Such a set will be called a 
context. Let {0i,02,...,0' z } be another context containing Oi and such that O, and O'. 
are not necessarily compatible. The compatibility between the elements of each context 
implies that they have a common refinement, which allows us to design an experiment 
in which all of them can be jointly measured. A hidden-variable model must provide a 
definite outcome for this measurement and hence the model provides a set of definite 
outcomes for each context. 

Definition 39. A hidden-variable model for a system is a set of extra variables A and a 
rule that specifies for each pair (p,A), where p is a pure state of the system and Ac A, 
a definite set of outcomes for every maximal context^ { 0 i, 02 ,..., 0 m }. 

Some authors consider hidden-variable models that are not deterministic, that is, 
the measurements may not have definite outcomes for every state. Nonetheless, the 
“non-determinism” in those models comes from the fact that we do not know everything 
about the system, and hence they can be completed to give a deterministic model. We 
will not consider this kind of model in this text. 

Suppose now that a hidden-variable model is provided for the system. Such a model 
assigns a string of definite values to both {0i,02,...,0 w } and {Oi,02,...,0'„}. We 
demand that the value assigned to 0\ be independent of the context in which it ap¬ 
pears: if the outcome of Oi according to the model is oi when a joint measurement 
of {0i,02,...,0,„} is performed, the same outcome 0 \ must be assigned to Oi by the 
model if we jointly measure { 0 i, 02 ,..., 0 '„}. 

Definition 40. We say that a hidden-variable model is noncontextuai if the value asso¬ 
ciated by the model to an observable O is independent of which and which compatible 
measurements are performed jointly. 

This observation was first pointed out by Bell [Bel66] , who argued that there is no 
a priori reason to require noncontextuality from a hidden-variable model. Suppose we 
perform the measurement of an observable Oi and together one may choose to measure 
either { 02 ,..., 0 m } or {0' 2 ,..., 0' n }, both compatible with Oi but not to one another. 
These different possibilities may require completely different experimental arrangements, 
and hence to demand that the values associated to Oi be the same can not be physically 
justified. The outcome of a measurement may depend not only on the state of the 
system, but also on the apparatus used to measure it. 

Although the measurement process and the interaction between system and apparatus 
are important issues in quantum theory, this is not the problem here, since we could 
include all variables of the apparatus in the model, and apply the same reasoning again. 
The point that makes the noncontextuality assumption plausible is that there is no need 

1 We say that a context is maximal if there is no other set of compatible measurements that contains 
it properly. 
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2.2. Contextuality: the compatibility hypergraph approach 


to measure the compatible observables simultaneously. Suppose we measure Oi and then 
we choose what else we are going to measure, { 02 ,..., 0 m } or {0' 2 ,0' n j or even if we 
are not measuring anything else. The hidden-variable model should predict the outcome 
of Oi, but if this model is contextual this value would depend on a measurement that 
will be performed in the future or, even worst, on a decision to measure or nor, yet to 
be made! 

Another way to enforce naturally the noncontextuality assumption is to design the 
experiment in such a way that the choice of { 02 ,..., 0 m } or {0' 2 ,...,0' n } is made in a 
different region of the space in a time interval that forbids any signal to be sent from 
one region to the other. Since no signal was sent, the choice of what is going to be 
measured in one part can not disturb what happens in the other, what demands the 
model to be noncontextual. In this situation, we say that the model is local and the 
noncontextuality assumption is usually referred to as the locality assumption. We talk 
about this special case in appendix [Bj 

2.2 Contextuality: the compatibility hypergraph 
approach 

Suppose an experimentalist has many possible measurements to carry out in a physical 
system. Each measurement has a number of possible outcomes, that occur with a certain 
probability for a given state of the system. 

Definition 41. Let X denote the set of possible measurements available. A compati¬ 
bility cover ^ is a family of subsets of X such that 

1. Each Ce'tc? is a set of compatible measurements; 

2. u Ce^C-X] 

3. C,C r e'to’ and CeC' implies C = C r . 

As we mentioned previously, each C e ^ is called a context. Condition [3] is called 
anti-chain condition and it guarantees that all contexts in ^ are maximal. 

We will assume without loss of generality that all measurements have the same 
number of outcomes. The set of possible outcomes will be denoted by O. We remark 
here that the actual labels given to the outcomes are not important. The only important 
thing in what follows is the number of elements in O. 

Definition 42. A triple (X,^,0) is called a compatibility scenarid£\ 

The compatibility relations among the elements of X can be represented with the 
help of a hypergraph. 

2 In this thesis, we will often use the word scenario instead of compatibility scenario. 
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2. Non-contextuality inequalities 


Definition 43. The compatibility hypergraph of a scenario (X,^, O) is a hypergraph 
such that the vertices are the measurements in X and the hyperedges are the contexts 
Ce^. 

Notice that the compatibility hypergraph does not suffice to identify the scenario, 
since the number of outcomes for each measurement is not determined. For a given 
subset C e c €, consider the set of possible outcomes for a joint measurement of the 
elements of C. This set is the Cartesian product of |C| copies of O and will be denoted 
by O c . This set can be identified with the set of functions 

A: C —► O. 

Each function Ae O c is called a section over C. 

When a system is prepared in a given state and the measurements in C are performed 
subsequently, a set of outcomes in O c will be observed. This individual run of the 
experiment will be called an event. Each event is an element of O c and hence is 
represented by a section over C. 

Definition 44. A probability distribution p for is a family of functions pc'. O c — *■ [0,1] 
such that Z 5e0 c pcis) = 1, C e S?. 

Each probability distribution can be associated to a vector peU' l ,n= ^ |O c |. If 
we have ^ = {Ci,C2,...,C„} and for each Q we have O c ' = {sCs?. s'”'}, we define 

P=[PcA s \) PC! [ 4 ) ••• PcA s l h ) ••• PC n {sh) PC n [s 2 n ) ••• PC„[Sn n ) ] (2.2) 

This association is discussed in more detail in reference |AQB + 13|. 

For a given compatibility cover, the set of possible probability distributions is a 
polytope with J~[ |O c | vertices. Each vertex corresponds to probability one for one 

CeSg 

of the outcomes s e O c for each context C e c £. All other distributions are convex 
combinations of these vertices. 

Let C = be a context in *€. Each element of O c is a string 5 = a n ) 

with n elements of O. For each U c C, there is a natural restriction 

rjj: O c - O u (2.3) 

S=(Cti) Mi eC s\u = {0-i) Mi eU- ( 2 - 4 ) 

This operation corresponds to dropping the elements in the string 5 that do not corre¬ 
spond to measurements in U. 

Given a probability distribution in Ce^ we can also naturally define marginal distri¬ 
butions for each U c C: 


Pu : O u - [ 0 , 1 ] 

Puis) = £ pcis'). ( 2 . 5 ) 

s'eO c ;r^{s')=s 
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2 .2. Contextuality: the compatibility hypergraph approach 


The superscript C in is necessary because the marginals may depend on the context 

C. 

Example 17. Consider the situation where 

X = {M l ,M 2 ,M 3 } and <€ = {Q = {M lt M 2 },C 2 = {M 2 ,M 3 }}, 


each measurement with two possible outcomes ±1. The extreme distribution with 
Pc i(1,1) = 1 and pc 2 (-l,-l) = 1 gives the marginals 1) = 1 and p\ |(1) =0. 

We will reject distributions with this property: we require that if two contexts C\ 
and C 2 overlap, the marginals defined by pc 1 and pc 2 in the intersection be the same. 

Definition 45. The non-disturbance set 3C (T) is the set of probability distributions 
such that if the intersection of two contexts C and C' is non-empty, then p^ nC , = PcnC 
A probability distribution p e 3C (T) is called an empirical model. 


The non-disturbance set is a polytope, since it is defined by a finite number of 
linear inequalities and equalities: the inequalities imposed by the fact that its elements 


represent probabilities and the equalities imposed by definition 45 


After imposing conditions on the restriction of the probability distributions, we ask 
now if it is possible to extend the distributions p c to larger sets containing C. The 
naive ultimate goal would be to define a distribution on the set O x , which specifies 
assignment of outcome to all measurements, in a way that the restrictions yield the 
probabilities specified by the empirical model on all contexts in c <o. A more subtle and 
adequate question is to decide when it is possible to achieve this goal. This question 
was first studied by Fine in reference [Fin82], for the restricted case of Bell scenarios 
(see appendix[B]) and generalized by Brandenburger and Abramsky in reference [ ABl l] . 


Definition 46. A global section for X is a probability distribution px '■ O x —» [0,1], 
A global section for a distribution p e 3C (T) is a global section for X such that the 
restriction of px to each context C e ^ is equal to pc . The distributions with global 
section are called noncontextual. 


A global section for a distribution p corresponds exactly to the existence of a distribu¬ 
tion defined on all measurements, which marginalizes to yield the probabilities determined 
by the empirical model. If a global section for p exists, p is called noncontextual because 
this global section is deeply connected to the existence of a noncontextual hidden-variable 
model reproducing the statistics of p. In fact, if there is a global section for p we can 
construct the hidden-variable model in the following way: as hidden variable we use 
an element of the classical probability space O x , and the value assigned by A e O x to 
a measurement M is A(M). Then, the global section px for p provides a probability 
distribution in the set of hidden variables with the property that if we average over all 
hidden variables according to this function we recover the quantum predictions. A proof 
of the converse can be found in section 8 of reference [ ABl l], and this gives: 
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2. Non-contextuality inequalities 


Theorem 21 (Brandenburger and Abramsky, 2011). A probability distribution peStf (T) 
has a global section if and only if there is a noncontextual hidden-variable model recov¬ 
ering its statistics. 

Some distributions do not admit global sections. They are called contextual. 

Example 18 (Contextual non-disturbing distribution). Consider the scenario {X, c €,0), 
where 


X = {M 1 ,M 2 ,M 3 ], h? = {{M 1 ,M 2 },{M 2 ,M 3 },{M 1 ,M 3 }} and 0 = {-1,1}. 


The distribution 



(TD 

(1,-1) 

(-1,1) 

(-1,-1) 

M] M 2 

1 

2 

0 

0 

1 

2 

m 2 m 3 

1 

2 

0 

0 

1 

2 

M\M 3 

0 

1 

2 

1 

2 

0 


where entry i j of the table is the probability of obtaining outcome j when measurement 
i is performed, is a non-disturbing distribution, but it does not have a global section. 
This distribution is the one that appears in the famous Specker's parable of the Over- 
protective Seer [ LSW11 J. 


2.3 Sheaf-theory and contextuality 


It is possible to provide a more formal mathematical formulation of contextuality using 
categories and sheaf theory, as pioneered by Abramsky and co-workers [ AD05 , iABl l], 
This approach provides a direct and unified characterization of both contextuality and 
non-locality, along with different new tools, insights and results. We provide a brief 
introduction to the sheaf theoretical aspects of contextuality in this section and we refer 
to [ AB11 ] for more detailed definitions and discussions. We use some terminology of 
category theory, which are explained in references |MM92I fMac98] , 

We start once again with a set X of possible measurements. The set of possible 
outcomes for each measurement is O, and when a set of compatible measurements 
E/cI is performed, a set of outcomes in O u will be observed. Each individual run of 
the experiment is what we called an event. 

Events in O u and sections over U are in bijective correspondence. Let s:U O be 
a section. The event associated to 5 is the event in which the measurements in U were 
performed and for each MeU outcome s(M) was obtained. 

Define the function e that takes each subset [/cl to O u , the set of sections over 
U. We can also define a natural action by restriction according to equation ( 12.4} : if 
UcU' 
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2.3. Sheaf-theory and contextuality 


This restriction is such that 


and if E/cE/'cE/", 


rjj = idu 


r U ' o r u ' 
U' r U 


r U" 

u 


(2.7) 

( 2 . 8 ) 


Let Set be the category whose objects are sets and arrows are functions between 
sets. Let 2?[X) be the category whose objects are the subsets of X and there is a unique 
arrow from U to U' if and only if U c U'. Let 3?{X) OP be the category whose objects 
are the subsets of X and there is a unique arrow frorrj^] U' to U if and only if U c U'. 
Then, we can use the function e defined above as a functor 


e : 2?(X) OP —* Set 


that takes each (/cl to e[U ) = O u and the unique arrow U' —* U to the restriction 
, when U c U'. Equations \2.7\ and ( |2.8| prove that e is in fact a functor and hence 
£ is a presheaf. 

Definition 47. Given a category C, a functor F:C op —* Set is called a presheaf. 

The functor e has another distinguished property. Let be a family of subsets 

of U such that Lb Ui = U and {s, e £((/ ; )}/ e / a family of sections that agree in all 
intersections, that is 

SilUiHUj = SjlUiHUj 

for every i,j e I. Then there is a unique section se e(U ) such that s\u t = In fact, 
given M e U there is at least one i e I such that Me (/,-. Let m = (M). Since all 

sections s; agree on the overlaps, m does not depend on the index i chosen. We define 
then s[M) = m. 

This distinguished property is called the sheaf condition and £ is called the sheaf of 
events || MM92 ]. 

Definition 48. Let F : g?(X) op —» Set be a presheaf and fjj' : F(U') — ► F(U ) be the 
arrow in Set associated to the unique arrow U' —» U if U c[/' '. If 5 e F(U'), let s\u = 
f l J (5). We say that F is a sheaf if it satisfies the following two conditions: 

1. Locality: If (f/ ; - <c X) is a covering of U e X, and if s,t e F{U ) are such that 
s\Ui = t\Ui for each set (/,-, then s= t\ 

2. Gluing: If [Ui) is a covering of U, and if for each i there is a section Si over Uj 
such that for each pair Ui,Uj, the restrictions of Si and Sj agree on the overlaps, 
that is 

SilUjnUj = Sj\uinUj> 

then there is a section se F[U ) such that s\u i = Si for each i. 

3 The opposite category or dual category F 0,> of a given category F is formed by reversing the mor- 
phisms, that is, interchanging the source and target of each morphism IMM92llMac98 :i. 
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2. Non-contextuality inequalities 


Sections correspond to definite outcomes, but most of the times it is not possible 
to predict with certainty the outcome of every measurement. When probabilistic theo¬ 
ries enter the game we must use probability distributions over the set of sections O u . 
To make definitions more general, we will consider distributions taking values over a 
commutative semiring R [ ABll J. 

Definition 49. An R-distribution on U is a function d:U —► R such that Y.meu d[U) = 

1 . 


When we are interested in probability distributions, R is the semiring of positive 
real numbers. Nonetheless, it is quite instructive to keep R general, even when we are 
working with probabilities in a compatibility scenario. 

We write 2>r{U) for the set of ^-distributions on U. 

Let f:U'—>U be a function among two sets U' and U. We define 

[f):Q> R ([/') — ® R (U) 

that takes each distribution d to the distribution S) R {f){d) = d' :Y — * R defined by 

d\y ) = Y, rf (x). 

x;f{x)=y 

This definition is functorial since @>R[id) = id and Q> R {g° f) - @>R{g) °Q) R {f). 

With the definitions above we can construct the functor 

Q)r : Set -» Set 

that takes each set U to the set of ^-distributions on U and each function f:U' — U 
to the function S> R {f): 2 >r{U') —* Q> R {U). 

We can compose this functor with the sheaf e to define the presheaf 

3>r°£\ 2?(X) op —* Set 

which assigns to each subset [JcX the set of ^-distributions on the sections over U. If 
U c LA', the unique arrow U' -* U is taken by this presheaf to the map @ R | rjj' j acting 
on the set of ^-distribution on O u> : if d e Q> r {e{U')), then 

Q) r (r^'j [d) = d\u 


where d\u{s) =Y. s '-,s'\ u =sd{s). 

The restriction d\u is the marginal distribution of d, which assigns to each section 
5 in the smaller set U the sum of the weights of all sections s' in the larger set that 
restrict to 5. 

We now take into count the fact that not all measurements can be performed to¬ 
gether, what can be done by considering a compatibility cover of X (see definition 


41). Each subset of X that belongs to ^ is a maximal set of compatible measurements. 
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2.4. Probability Distributions and Physical Theories 


With the language of categories introduced above, an empirical model for the scenario 
(X,^, O) is a family of ^-distributions ec e2>r(£(C)), C g c £. Once more, we consider 
only non-disturbing models, that is, we demand that 


edcnC' = e C'lcnC' 


whenever CnCV 0- 

We have already observed that the presheaf £ is indeed a sheaf. It is natural to ask 
if the same holds for the presheaf 3>ro£. The no-disturbance condition corresponds 
precisely to the first condition required for a presheaf to be a sheaf, and hence the sheaf 
condition for 3>ro£ is equivalent to the existence of a global distribution d cQ)ro e(X) 
such that d\c = ec to each context C. 

Theorem 21 implies that such a distribution d exists if and only if there is a hidden 
variable model reproducing the statistics of the empirical model. Hence, we have: 


Theorem 22. The empirical model (ec) satisfies the sheaf condition if and only if there 
is a hidden-variable model reproducing its statistics. 

A proof of this result can be found in reference [ AB11 . 

Thus, we have a characterization of the phenomena of contextuality in terms of 
obstructions to the existence of global sections in a presheaf, which opens the door to 
the use of the methods of sheaf theory to the study of contextuality. 


2.4 Probability Distributions and Physical Theories 

2.4.1 Classical Non-Contextual Realizations 

Given a hypergraph T, a classical realization for T is a probability space (Cl,I.,pi), where 
Cl is a sample space, Z a cr-algebra and p a probability measure in Z, and for each i eV 
a partition of Cl into |0| disjoint subsets A 1 , e Z, j e O, where V is the set of vertices 
oQr. For each context C = {Mi,...,M„}, the probability of the outcome ai,...,a n is 


p(a\,a n \M\,M n ) = p 



The probability distributions that can be written in this form are called classical distri¬ 
butions. The set of classical distribution^] (C) is a polytope with |0 
of them noncontextual. 


x\ 


vertices, al 


As an immediate consequence of theorem 21 we have the following result: 


4 Equivalently we can say that a distribution is non-contextual if for each i e V there is a random 
variable R, : Q —>• O and p{a i,. ..,a n \M\,...,M n ) — p(Rj — ap. 

5 This set depends also on the set of possible outcomes O, but we will not write this explicit to sim¬ 
plify the notation. 
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Corollary 9. A distribution has a global section if and only if it is classical^ 

In fact, once a classical realization is given, the construction of the global section is 
guaranteed by the fact that the intersection of a finite number of sets in a cr-algebra 
also belongs to the cr-algebra. Conversely, given the global section, we can construct 
the classical realization using the same argument present in the paragraph preceding 
theorem [211 


2.4.2 Quantum Realizations 


A quantum realization is given by a Hilbert space for each i e V a Hermitian matrix 
Oi in this Hilbert space, and a density matrix p acting on . For a given context 
Ce^, the compatibility condition demands the existence of a basis for J€ in which all 
Oi belonging to C are diagonal. For each context C = the probability of 

the outcome a\,...,a n is 


p[a\,..., a n \M\,..., M n ) - Tr 



where P ak is the projector over the eigenspace corresponding to outcome of ob¬ 
servable Ok- The probability distributions that can be written in this form are called 
quantum distributions. Notice that the Hilbert space is not fixed and the set of quantum 
distributions contains realizations in all dimensions. This set, which we denote by i2(T), 
is a convex set but is not a polytope in general. 


Theorem 23. The set of quantum distributions J3(F) is a convex set. 

Proof. Let p 1 and p 2 be two quantum distributions. We want to prove that any convex 
combination 

ap l + j3p 2 , 0<a,/3<l, a + /3= 1 

is a quantum distribution. 

Let p 1 and observables {Oj} be a quantum realization for p 1 and p 2 and observables 
jo 2 j be a quantum realization for p 2 , that is 


and similar for p 2 


p 1 (fli,..., a n \M\,M n ) = Tr 


p 2 {ai,...,a n \M 1 ,...,M n ) =Tr 


FKp 1 

k 


n<p 21 

k 


where P\ is the projector over the eigenspace corresponding to outcome a,k of observable 
0 [ and analogously for P 2 k . 

6 This result shows that is possible to use the notion of global section to define non-contextual dis¬ 
tributions: we say that a distribution is non-contextual if it has a global section. 
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2.5. Non-Contextuality Inequalities 


It is important to notice here that the density matrices and projectors in the quantum 
realizations for p\ and p 2 given above do not have necessarily the same dimension. 
Nonetheless, it is always possible to extend one of them to a Hilbert space of higher 
dimension, so without loss of generality we will consider that all density matrices and 
projectors act in the same Hilbert space . 

Let {|1),|2>} be an orthonormal basis for C 2 and define the density matrix 

p - ap l ® |1) (1| + (3p 2 ® |2) <2| 


and the projectors 

P ak =P 1 ak ®\l)(l\+P 2 ak |2><2|, 
acting on J€ ®<C 2 . Then we have that 


p {a\,..., a n \M\,..., M n ) :=Tr 




+ P 



which implies that 


p = ap l + ftp 2 . 


Hence, any convex combination of quantum distributions is also a quantum distribution. 

□ 


It is important to mention that the use of a Hilbert space of higher dimension than 
J€ can not be avoided. In fact, if we bound the dimension of the quantum realizations, 
we get a set that is not convex, as shown by Pal and Vertesi in reference [ PV09J . 

The set of classical distributions is contained in the set of quantum distributions. To 
prove that, we just have to notice that the set of distributions obtained from a probability 
space with n elements is equivalent to the set of distributions obtained with diagonal 
projectors and density matrices in a Hilbert space of dimension n with a fixed basis. The 
set of elements in the sample space Q is the set of unidimensional projectors and the 
measure is given by p(P) = Tr[pP). 


2.5 Non-Contextuality Inequalities 

We would like to find simple criteria to decide whether a probability distribution p is 
noncontextual or not. According to theorem [9] this is equivalent to test if p e (T). 
We will use the fact that (T) is a polytope to derive a finite number of inequalities 
that provide necessary and sufficient conditions for membership in this set. 

A convex polytope may be defined as an intersection of a finite number of half-spaces. 
Such definition is called a half-space representation (H-representation or H-description). 
There exist infinitely many H-descriptions of a convex polytope. However, for a full¬ 
dimensional convex polytope, the minimal H-description is in fact unique and is given by 
the set of facet-defining halfspaces. 
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2. Non-contextuality inequalities 


Since Jt' c €{X) is a polytope, there is a minimal set of inequalities giving a H- 
representation. Some of this inequalities are the trivial inequalities related to the defi¬ 
nition of probability distributions (positivity and normalization), but others are not and 
in general are not satisfied by all quantum distributions. These inequalities are called 
noncontextuality inequalities. 

Definition 50. A noncontextuality inequality is a linear inequality 

S-=Y,Ta 1 ,...,a n \M ll ...,MnP^ a i’---’ a n\Mi,...,M n ) < b, (2.9) 

where all y ai ,...,a„\M 1 ,...,M n ar| d ^ are rea ^ numbers, which is satisfied by all elements 
of the classical polytope (F) and violated by some contextual distribution. A tight 
noncontextuality inequality is a linear inequality defining a non-trivial facet of the classical 
polytope YH't? (T). 

Any H-description provides a necessary and sufficient condition for membership in 
(r): a distribution p is classical if and only if it satisfies all noncontextuality inequal¬ 
ities for this scenario. Although verifying if a distributions satisfies or not the inequalities 
is very simple, finding all inequalities that provide an H-description for a general scenario 
is a very difficult computational task, related to the max-cut problem, which belongs to 
the NP-hard class of computational complexity [ BM86 DL97 , AII06J. 


2.6 The KCBS inequality 

The KCBS scenario was introduced by Klyachko, Can, Binicioglu, and Shumovsky in 
reference [KCBS08J. It consists of five measurements X = {Mo, M\,M 2 ,Ms, M 4 }, with 
compatibility structure given by 


<€ = {{M 0 , Mi}, {Mi,M 2 }, {M 2 , M 3 }, {M 3 , M 4 }, {M 0 , M 4 }}. 

The set of possible outcomes is O = {±1}. The hypergraph T in this case is a familiar 
simple graph: the pentagon. 

This scenario was completely characterized in references jAra!21 AQB + 13|. There 
are 2 4 tight noncontextuality inequality and all of them can be written in the form 


Y,n(MiM i+ i)< 3, 

i -0 


( 2 . 10 ) 


where (M/M ; > = p ( Mi = M ; + 1 ) - p ( Mi ^ M/+ 1 ), y ; - e {±1} and the number of y,- = -1 is 

odd. 

The inequality obtained when all y ( - = -1 is the famous KCBS inequality, presented 
in the seminal paper (KCBS08] . It is equivalent to the inequality 


Z< p i^ 2 ’ 

i -0 


( 2 . 11 ) 
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2.6. The KCBS inequality 



Figure 2.1: The compatibility hypergraph of the KCBS scenario. 


where P t = 1-2 M t and <P,-> = p{P t = 1 )-p(P, = -1). 

These inequalities are violated by some quantum distributions in dimension three or 
higher. The maximal violation for inequality ( | 2 . 10 | l for quantum distributions is 5-4y / 5, 
which corresponds to a maximal quantum value of \/5 for inequality | | 2 . 11 | l. These 
violations can be obtained with the state \xf/) = ( 1 , 0 , 0 ) and with projectors 


Pi = 


cos(0),sin(0) cos 


'Ain 

- 1 ,sm{ 6 ) sin 

5 


'Ain 


where cos 2 ( 0 ) = f cos ^ 5 f )L . 

(l+cos(|)) 

An interesting property of these projectors is that they are orthogonal if (i,j) £ E{T). 
This implies that the outcome 11 can never occur in a measurement of Mi and Mj. 

Some non-disturbing distributions can achieve the algebraic maximum violation of 5 
for inequality ( | 2 . 10 | . 

Example 19. The no-disturbing distribution 
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2. Non-contextuality inequalities 


where y,- = 1 for i = 0,1,2,3 and y 4 = -1, reaching the algebraic maximum for the KCBS 
inequality |2.10 l. 

This shows that, in general, the violation obtained with no-disturbing distributions 
is higher than the quantum maximum, and hence, that the non-disturbance polytope 
contains properly the quantum set. 


2.7 The ft-cycle inequalities 


A simple generalization of the KCBS inequality is obtained when we use as the com¬ 
patibility hypergraph an n-cycle: a graph with n vertices 0,l,...,n-l and such that 
two vertices i,j are connected iff \i-j\ = 1 mod n. The corresponding scenario has n 
measurements X = {Mo,Mi,...,M„_i}, with compatibility structure given by 

V = {{M 0 , Mi}, {Mi,M 2 }, {M„_ 2 ,{M„_!,M 0 }}. 

The set of possible outcomes is also 0 = {±1}. The complete set of noncontextuality 
inequalities for this scenario was found in reference [AQB + 13|. 

Theorem 24. There are 2 n ~ l tight noncontextuality inequalities for the n-cycle sce¬ 
nario, and they are of the form 

n -1 

^ri{XiX M )<n-2, ( 2 . 12 ) 

i =0 

where the sum is taken modulo n, y,- = +1, and the number of indices i such that 
ji = -1 is odd. 


Some quantum distributions violate this bound if n > 4. The maximum quantum 
violation is given by 

( 3wcos(^)-/i , r . , . 

1 -—— if n is odd, 

if n is even. 


(2.13) 


< 1+cos ({r) 

l n cos(2) 

For n odd, the quantum bound can be achieved already in a three-dimensional system, 
with the state ( 1 0 0 ) and measurements Mi = 2\vf) {Vj\ - I, where 


|y«> = ( cos(0) sin(0)cos(^Mj S i n (e) sin (MEzlij ) 
and cos 2 (0) = tt—- 41vr- 

(l+cos(|)) 

For n even, the quantum bound can be achieved in a four-dimensional system, with 
the state 0 ^ ~Tf 2 ® ) anc ^ measurements Mi = Xi® I for odd i and M ; - = I® Xi 

for even i, where = cos [^)cr x + sin [qf)cr z . 

These bounds were calculated with the help of the tools we will introduce in the next 
section. 

The interest in this scenario comes from the fact that all distributions in scenarios 
where the compatibility graph has no closed loop are noncontextual. 
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2.8. The Exclusivity Graph 


Theorem 25. There is a quantum noncontextual distribution if and only if Y has an 
n-cycle as induced subgraph with n> 3. 

Equivalently, we may say that there is quantum violation of some noncontextuality 
inequality for the scenario if, and only if T has an n-cycle as induced subgraph with 
n > 3. In this sense, the n-cycle scenarios are the simplest ones where it is possible to 
find quantum violations of noncontextuality inequalities. For a proof of this result, see 
reference [BM10 . 


2.8 The Exclusivity Graph 

Given a scenario it is possible to define another graph related to it that allows the calcu¬ 
lation of several bounds for the associated inequalities. We introduce some definitions 
first. In what follows 

CL\,..., CL n \,..., M n 

will denote the event where compatible measurements were performed and 

outcomes a\,...,a n were obtained. 

Since each outcome a* in measurement Mi is associated to an element of ST, the 
event a\,...,a n \Mi,...,M n is associated to a composition of transformations, which is 
also a transformation according to corollary [3j 

Definition 51. We say that two events are exclusive if the corresponding transforma¬ 
tions represent different outcomes of the same measurement. 

Definition 52. Given a scenario, the exclusivity graph <£ of this scenario is the simple 
graph whose vertices are labeled by all possible events 

CL\,..., n n \M\, . .., M n 

in this scenario. Two vertices are connected by an edge if and only if the corresponding 
events are exclusive. 

Generally, not all possible events are involved in a given inequality. The ones involved 
define an induced subgraph of ^ from which we can get a lot of information about the 
inequality. 

Definition 53. The exclusivity graph G of a noncontextuality inequality is the induced 
subgraph of ^ defined by the vertices that correspond to events appearing in the in¬ 
equality. 

Example 20 (The exclusivity graphs of the n-cycle inequalities). Since 
(MiMj) = 2[pin\M i M j ) + p {-1 - 1 | MiMj)) - 1 


and 

-{MiMj) = 2[p{\ - 1 | MiMj) + p{-n\MiMj )) - 1 , 
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Figure 2.2: Exclusivity graphs for the n-cycle inequalities for n = 3,4,5,6. 


there are 2 n events in each noncontextuality inequality for the n-cycle scenario. If n 
is odd, the corresponding exclusivity graph is the prism graph of order n, Y n , and if n 
is even, the exclusivity graph is the Mobius ladder of order 2 n, M 2n . The first four of 


these graphs are depicted in figure 2.2 


We restrict ourselves now to the case where all coefficients Ya 0 ,...,a n \M l ,...,M n ' n equa¬ 
tion ( |2.9| l are equal to one. Many important inequalities can be written in this form, 
including the n-cycle inequalities. In this case, we can use the exclusivity graph G and 
some graph functions to get information about the maximal bounds for the quantity S 
in different probabilistic theories. First, a few definitions from graph theory. 


Definition 54. An independent set or stable set in a graph G is a set of vertices of 
G, no two of which are adjacent. A maximum independent set is an independent set of 
largest possible size for G. 


Definition 55. The independence number a(G) of a graph G is the cardinality of a 
maximum independent set of G. 

Definition 56. Let {1 be the set of vertices of a graph G. An orthonormal 

representation for G in a finite-dimensional vector space with inner product V is a set 
of unit vectors {\u\),... ,\u n )} such that \u{) and | Uj) are orthogonal whenever i and j 
are not connected in G. 


Definition 57. The Lovasz number of a graph G is 


-9(G) = max^ (Uj \ if/) 

i 


where the maximum is taken over all V and over all orthogonal representations {\u\),... ,\u n )} 
for G and all unit vectors |i \>) in V. An orthonormal representation achieving the maxi¬ 
mum, called an optimal orthonormal representation, always exists. 

Both a(G) and 0(G) are extremely important for the study of classical and quantum 
bounds of noncontextuality inequalities. For a more detailed discussion about these 
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2.8. The Exclusivity Graph 


graph functions, see |Lov79l ILov95[ IKnu94l IRos67l IBol98| . 

Theorem 26 (Cabello, Severini and Winter, 2010). The classical bound of a noncon- 
textuality Inequality is the independence number a(G) of the exclusivity graph G of the 
inequality. 

Proof. Since the noncontextuality inequalities are linear, the maximum classical bound 
is achieved in a vertex of the noncontextual polytope. For such a vertex, the probability 
of each event is either zero or one and the value of the sum S for this distribution is 
equal to the number of events with probability one. Since the sum of the probabilities 
of two exclusive events can not be higher than one, two connected vertices can not have 
probability equal to one at the same time. Hence, the set of vertices whose probabilities 
are one is an independent set, and hence can not have more than a(G) elements. 

To prove that equality holds, it suffices to take any maximum independent set and 
use the classical distribution that assigns probability one to each vertex in this set. 

:n 


Theorem 27 (Cabello, Severini and Winter, 2010). The quantum bound of a non¬ 
contextuality inequality is upper bounded by the Lovasz number -9(G) of the exclusivity 
graph G of the inequality^ 

Proof. The maximal quantum value for S is obtained for a pure state p = |i/r)(i/r|. 
Let {ed, ei - a l 0 . a l n . \M[ ...,M l n ., be the set of events present in the inequality and 

Pi = UkP i k be the projector corresponding to e;, where P L k is the projector associated 

a k . . a k 

to outcome a 1 , for measurement Ml . Define 


I Vi) 


PiM 

p<W)\‘ 


Then we have 

S = Y,p(a l 0 ,...,a l n .= £|(i/r| v t )\ 2 . 

i i 

If et and ej are exclusive events, the corresponding projectors Pi and Pj are orthogonal, 
and hence \vp and | vj) are also orthogonal. The set of vectors \vf) and the state |i/r) 
provide an orthogonal representation for G and 


EKVH ^)| 2 <9(G). 


m 


Example 21 (Quantum bound for the n-cycle inequalities). The observation that 

were used by the authors in 


m n ) 


3«cos(^)-« 


1+cos |i) , 9(M 2 „) = n cos(^), and theorem 
reference |AQB + 13| to find the quantum maximum vio 


27 


ation of the n-cycle inequalities. 


7 If the coefficients of the inequality are not all equal to one, we use the wighted versions of a and f). 
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2. Non-contextuality inequalities 


Although in the previous example the bound was tight, this is not true in general. 
This can happen when the scenario imposes extra constraints that make the Lovasz 
optimal representations for the graph unattainable for quantum systems. 


Example 22. In reference [ SB BC 13] we find three inequalities for which -9(G) is larger 
then the quantum maximum. Consider the scenario where the system is composed by two 
spatially separated parties. In the first subsystem there are two measurements available, 
denoted by A 0 and A\, and in the second subsystem we also have two measurements 
available, denoted by B 0 and B\. All measurements have two possible outputs, 0 and 1. 
In this case, the compatibility of the measurements in different systems is guaranteed by 
spatial separation (for more details, see appendix [B]). The compatibility hypergraph is a 


square, with edges linking measurements in different parties, as shown in figure 2.3 



Figure 2.3: The compatibility hypergraph for the bipartite scenario with measurements 
{A 0 ,Ai} for the first party and measurements {B 0 ,Bi} for the second party. 


This scenario admits two noncontextuality inequalities with quantum bound larger 
than the classical bound for which the exclusivity graph is a pentagon: 


p(00|00) + p(ll|01) + p(10|ll) + p(00|10) + p(ll|00) < 2, 

p(00|00) + p(ll|01) + p(10|ll) + p(00|10) + p(_l|_0) < 2. 


In the inequalities above, ab\xy denotes the event where the first party applies measure¬ 
ment A x and gets outcome a and the second party applies measurement B v and gets 

outcome b\ _11_0 corresponds to the event where the second party applies measurement 

So and gets outcome 1, irrespectively of the first party’s action. 

The quantum bound for the first inequality is approximately 2.178, while for the 
second it is approximately 2.207. The events appearing in these inequalities and their 
exclusivity structures are shown figure 2.5 (a) and (b). 
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Consider also the scenario where the first party has three measurements, instead of 


two. The compatibility hypergraph of this scenario is shown in figure 2.4 



Figure 2.4: The compatibility hypergraph for the bipartite scenario with measurements 
{Ai,A 2 , A 3 } for the first party and measurements {B\, B 2 ] for the second party. 


This scenario admits one noncontextuality inequality with quantum bound larger than 
the classical bound for which the exclusivity graph is a pentagon: 

p(00|00) + p(ll|01) + p(10|ll) + p(00|10) + p(ll[20) < 2. 

The quantum bound for this inequality is approximately 2.207. The events appearing in 


these inequalities and their exclusivity structure are shown in figure 2.5 (c). 

For each of these inequalities, the quantum bound is strictly smaller then the Lovasz 
number of the pentagon d{C$) = V5 « 2.236. This proves that, in general, fi(G) gives 
only a loose upper bound for the maximum quantum value of the inequality. 


2.9 Contextuality: the Exclusivity-Graph Approach 


2.9.1 A graph approach to the Bell-Kochen-Specker Theorem 

The mathematical content of the original proof of the Bell-Kochen-Specker theorem is 
that there are sets of one dimensional projectors for which it is not possible to assign 
definite values 0 or 1 noncontextually in such a way that, if a set of mutually orthogonal 
projectors add to identity, then the value 1 must be assigned to one, and only one, of 
them (for more details, see section A.3 of append ix[A|. 

The usual physical interpretation of this result connects each projector to a mea¬ 
surement in a quantum system with possible outcomes 0 and 1. The noncontextuality 
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10 | 11 10|11 10|11 





Figure 2.5: The labeling of the exclusivity graph for the three noncontextuality inequal¬ 
ities with pentagonal exclusivity structure. 


assumption translates into the observation that the value assigned to each measurement 
is independent of other compatible measurements performed simultaneously. With this 
association, the theorem implies the impossibility of noncontextual assignment of def¬ 
inite values to all measurements in a quantum system consistently with the quantum 
statistics, proving the impossibility of noncontextual hidden-variable models. 

The set of one-dimensional projectors in a proof of the Bell-Kochen-Specker theorem 
can be represented using a graph, known as the Kochen-Specker diagram. The vertices 
of the graph are the projectors in the set and two of them are joined by an edge whenever 
they are compatible. 

We can look at this result from a different perspective. Instead of associating each 
projector with a measurement, we will use the fact that any projector P belongs to 
the set of allowed transformations 3~ of the system and associate it with a possible 
outcome of a measurement. With this interpretation, each vertex in a Kochen-Specker 
diagram corresponds to an element of ST and two vertices are connected by an edge if 
the corresponding transformations can be associated to two different outcomes of one 
and the same measurement. 

Suppose now that a hidden-variable model is given. This model provides definite 
values to all measurements, and hence, given a transformation P, we know if the outcome 
it corresponds to occurs or not. If the outcome associated to the measurement by the 
hidden-variable model is the one that corresponds to P, we associate the value 1 to P. 
Otherwise, we associate the value 0 to P. 

If we have a set of projectors {P\ } ...,P n } summing up to identity, we know that there 
is a measurement for which the outcomes are associated to these projectors. Hence, 
since one, and only one, outcome must occur, one, and only one, of these projectors is 
associated to the value 1. Hence, we have 

X>(Pl) = 1 (2.14) 
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where v{P{) is the value assigned by the model to projector Pi. 

In this new perspective, the noncontextuality assumption means that the value associ¬ 
ated to a projector P by the hidden-variable model is independent of the other projectors 
used to define the measurement. As we have seen, the same transformation corresponds 
to an outcome of several different measurements. Then, whenever P corresponds to 
an outcome of different measurements a noncontextual hidden-variable 

model assigns the outcome corresponding to P to some M; if and only if it does for all 
other My. 

We can also see the KCBS inequality in this new perspective. The compatibility 
graph G of this scenario is a pentagon and the maximum quantum violation is obtained 
with projectors Pi such that Pj and Pj are orthogonal if (i, j) e E{G). This observation 
leads to two different interpretations of the graph G in quantum realizations in this 
particular case. First, each vertex i of G can be viewed as the observable associated to 
the projector Pi. The second way to interpret G is associating a measurement M to 
every edge ( i,j)eE{G ) which includes outcomes associated to Pi and Pj. 

In the exclusivity graph approach, we start with a graph G with vertices V(G) and 
edges E{G). For each i e V there is a transformation Pj e ST in a probabilistic model 
and for each (z,y) e E{G) a measurement among whose outcomes are the Pi and Pj. 
Hence, the events represented by each vertex are mutually exclusive. 

Given a graph G, a physical model for G is a set of measurements in a physical 
system, one for each edge in E(G ). For a given state of the system, there is a probability 
associated to each event i e V. We collect these probabilities in a vector p e IR |y| . The 
set of possible vectors depends on the physical theory used to describe the system and 
we will study this set for classical probability theories, quantum theory and generalized 
probabilistic theories with certain properties, as explained below. 

2.9.2 Classical Non-Contextual Realizations 

A classical realization for G is given by a probability space (£2,£, p), where H is a sample 
space, Z a a- algebra and p a probability measure in Z and for each i e V a set A( e Z 
such that Ai n Aj = 0 if (z,j) belongs to E{G). For each i the probability of outcome i 
is 

Pi = p{Ai). 

The set of probability vectors obtained with classical models d?c(G) is a polytope. Distri¬ 
butions that belong to this set are called noncontextual distributions. Incidentally, this 
set is a well-known convex polytope in computer science literature, where it is denoted 
by STAB{G ) |Knu94l [Rol67] . 

2.9.3 Quantum Realizations 

A quantum realization for G is given by a density matrix p acting in a Hilbert space J€ 
and for each i e V a projector Pi acting in such that Pi and Pj are orthogonal if 


71 







2. Non-contextuality inequalities 


(z,j) belongs to E(G). For each i the probability of the outcome i is 


Pi = Tr[Pip). 

The set of probability vectors obtained with quantum realizations will be denoted by 
<Sq{G) and it is not a polytope in general. This set is a well-known convex body in 
computer science literature, where it is denoted by TH{G ) |Knu94lfRos67] . Distributions 
that belong to this set are called quantum distributions. 

If we fix a basis for J€ and consider all matrices diagonal in this basis we recover the 
classical distributions. Hence 

$c(G) c $q(G). 


2.9.4 The Exclusivity Principle 

The main point of this work is to provide physical principles that single out quantum 
theory in the landscape of theories presented in chapter[l] With this purpose in mind, we 
will also consider probability distributions obtained when we use generalized probability 
theories, but we demand that they satisfy the following principle: 

Principle 1 (The Exclusivity Principle). Given a set {eEs of pairwise exclusive events, 
the corresponding probabilities p^ satisfy the following equation: 

£> fc <L (2.15) 

k 


From now on, we refer to the Exclusivity principle simply as the E-principle. 

From the graph theoretical point of view, this restriction is equivalent to impose 
the condition that whenever the set of vertices {z^} is a cliqu^] in G, the sum of the 
corresponding probabilities p^ can not exceed one. 

Specker pointed out that, in quantum theory, pairwise joint measurability of a set 
M of observables implies joint measurability of M, while in other theories this impli¬ 
cation does not need to hold |5pe60 . This property is known as the Specker principle. 


Later, Specker conjectured that this is the fundamental theorem of quantum theory 
|Spe09|. The E principle is a consequence of the Specker principle, as shown in refer¬ 
ence fNBD + 13| . 

The E principle can be used to explain why (some) distributions outside the quantum 
set are forbidden. Many promising results where found so far, as we discuss in chapter 

El 


2.9.5 E-Principle Realizations 

An E-principle realization for G is given by a state in a probabilistic model and for 
each i e V a transformation T,- £ 3 ~, such that the corresponding probability distribution 
satisfies the E principle. 

8 A clique in G is a complete induced subgraph of G. 
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The distributions obtained in this way are called E-principle distributions. The set 
of all E-principle distributions, denoted by <§e(X), is also a polytope. This set is a well 
known convex polytope in computer science literature, where it is denoted by QSTAB{G ) 
|Knu94l Ros67 ]. 

It is a known fact from computer science literature that TH(G) ccQSTAB{G), which 
is equivalent to S’q(G) c<? £ (G). This was also proven in references (CSW141 IFSA + 13| . 

Theorem 28. The quantum distributions satisfy the E principle. 

Proof. In quantum theory, exclusive events are associated to orthogonal projectors. 
Hence, if {e*} is a set of mutually exclusive events, a quantum realization will provide a 
set {Pd of mutually orthogonal projectors. As a consequence we have 

i 

and hence 

Y J Pi = JL Tr i P iP)^ Tr {p)- L 

i i 

□ 


2.10 Non-contextuality inequalities in the 
exclusivity-graph approach 

Once more, since the set <?c(G) is a polytope, it admits an H-description: a finite set 
of linear inequalities which provide necessary and sufficient conditions for membership in 
this set. 

Definition 58. A noncontextuality inequality is a linear inequality 

( 2 . 16 ) 

where all y ; - and b are real numbers, which is satisfied by all elements of the classical 
polytope ^c(G) and violated by some contextual distribution. A tight noncontextuality 
inequality is a linear inequality defining a non-trivial facet of the classical polytope <?c(G). 

To obtain necessary and sufficient conditions for membership in <c?c(G), we have to 
find all tight noncontextuality inequalities for G. This is a difficult problem, in general, 
and sometimes it is useful to concentrate in one particular inequality and find out what 
information it can give. 

Given a graph G = [V,E], consider, for example, the sum of probabilities 

P=LPi- ( 2 . 17 ) 

ieV 

We can use this sum to provide necessary conditions to membership in <fc(G), <§q(G) 
and <§eIG). To derive these conditions we need to identify what are the maximum 
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values of p for each of classical, quantum and E-principle realizations, which will be 
denoted respectively by Pc, /3q and Pe . Naturally, by theorem 28 and the fact that 
<§c(G) c<gg(G), we have 

Pc ^ Pq ^ Pe- 

The inequality 

E Pi - Pc ( 2 . 18 ) 

ieV 

is a rioncontextuality inequality as long as /3c < and 

Y,Pi<pQ ( 2 . 19 ) 

ieV 


is a necessary condition for membership in S’qIGJ. 

Also in the exclusivity-graph approach, the graph functions a(G) and #(G) can be 
used to calculate Pc and Pq. The bound Pe can be calculated with the help of another 
graph function, known as the fractional packing number of G. 


Definition 59. The fractional packing number a*(G) of a graph G is defined by 


a*(G) = max 


X> 


0 < pi < 1 and 


HPi 

ieC 


1,C any clique of G 


Theorem 29 (Cabello, Severini, and Winter, 2010). Given a graph G, 


Pc = ctiG ), P Q = BIG), p E = a*(G) 


w/7ere a(G) is the independence number of G, BIG) is the Lovasz number of G and 
a*(G) is the fractional-packing number of G. 

This result follows directly from the observation that <?c(G) = STABIG), <SqIG ) = 
TH{G) and <§e(G) = QSTAB{G ) and the well known fact from computer science literature 
that a(G), d{G), a*(G) are the maximum values of Y.iPi over STAB(G), TH{G), and 
QSTAB{G ) respectively | IKnu94l IRos67 ], Nonetheless, we provide a proof here because 
it may help us to understand the physical significance of these graph functions. 


Proof. The classical bound is achieved in a vertex of the noncontextual polytope. For 
such a distribution, each pt is equal to zero or one. If i and j are connected by an 
edge in G they represent different outcomes of the same measurement and hence pi 
and pj can not be both equal to one. Hence the set of indices i such that pi = 1 is an 
independent set and can have at most a(G) elements. This implies that Pc < a(G) and 
equality is achieved if we choose any independent set /c V{G) with a(G) elements and 
define pi = 1 if and only if i e I. 

The quantum bound is achieved when we use a pure state \y/). Let Pi be the 
projector associated to vertex i and 


I Vi) 


Pity) 

p,ty)[ 
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If i and j are connected by an edge in G, the corresponding projectors are orthogonal 
and the vectors \vp and | vf) are also orthogonal. Hence, the set of vectors \vp and 
| if/) provide an orthogonal representation for G and hence 

E Pi = E (vl p i W = EKVI Vi) I 2 < -0(G). 

i i i 

On the other hand, given a orthogonal representation for G and a state \ifr), let 

Pi = l^iXyd- The projectors Pj and Pj are orthogonal if i and j are connected in G 
and hence P, and |i/r) provide a quantum realization achieving the upper bound 0(G). 

The equality Pe = a*(G) follows directly from the definition of a*\ the restriction 
0 < pi < 1 is satisfied if and only if the pi represent probabilities and the condition that 
£/ec Pi ^ 1 for any clique C of G is exactly the demand that the E principle be satisfied 
by the distribution. 

□ 


We can also calculate the maximum of general linear functions 


S w = Y J W i p i , wi> 0 
i 


( 2 . 20 ) 


using the weighted versions of the a, 0 and a* [Knu94l Ros67], as shown by Cabello, 
Severini, and Winter in reference [CSW14] , 


Example 23 (A new version of the n-c ycle inequalities). The simplest exclusivity graph 
for which Pc < Pq is the pentagon [ CDLP13 . It can be proven by inspection that 
Pc = 2. The quantum bound is pq = \/5, as shown by Lovasz original calculation of 
0(C5) [ Lov79 j. The maximum value obtained with E-distributions is |, which can be 
reached when all events have probability equal to 

When G is any n-cycle with n odd, we can also prove by inspection that the classical 
bound is Pc = The quantum bound can also be explicitly calculated, and we 


have that Pq 


ncos iii) 

l+cos(£) 


, which is equal to Vb for n - 5. The maximum obtained with 


E-distributions is |, which can be reached when all events have probability equal to 
If n is even, C n is a bipartite graph, and the vertices in one bipartition define a 
maximal independent set. The parts have the same size, and hence the classical bound 
is The distribution that assigns probability \ to all vertices realizes the bound Pe, 
which is then equal to Pc- The quantum bound Pq is sandwiched between Pc and Pe 
and hence we conclude that Pq is also equal to 


2.11 The quest for the largest contextuality in 
nature 

The connection of the classical and quantum bounds for noncontextuality inequalities 
and graph theory allows one to study the violation of such inequalities focusing only 
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on the graph itself. To study how quantum representations may differ from classical 
ones we seek for graphs with “large” violations. In this section we show some families of 
graphs with this behavior and present the known results about the growth of both a(G) 
and 0(G) with the number of vertices of G. 

The measure of violation we propose is the ratio as a function of the number 
of vertices in the graph G, which represent the number of possible outcomes (elements 
of ST in the experiment). 


2.11.1 The quantum gambler 


A famous bookmaker accepts all kinds of bets. A gambler brings a preparation device 
and a set of measurement devices. The preparation device works on demand, always 
preparing the same known state. The compatibility structure of the measurement devices 
is also known, and exclusiveness can be directly verified. 

A set of events with n vertex-transitive exclusivity graph G is picked. The state is 
such that all events in this set have equal probability p. The gambler chooses one of 
the events and bets c units of money that this event will happen. If this is the case, the 
bookmaker agrees to pay her 


c 

p + e 


( 2 . 21 ) 


units of money. The value of e is chosen in such a way that the bookmaker guarantees 
his profit after many rounds of the game. 

If the bookmaker believes the system to be classical, the prize will be calculated 
using p = j z . If the gambler is able to arrange the same scenario in a quantum system, 
P ~ This means that a quantum gambler, playing against a classical bookmaker 
will increase her profit after many rounds by a factor of Hence the gambler will 

seek for the scenario where this ratio is as large as possible. 


2.11.2 The growth of the ratio | 

An important family of noncontextuality inequalities is the n-cycle inequalities, presented 
in example |20j In this case, the compatibility graph is a cycle with n vertices. If n is 
odd, the exclusivity graph G is the prism graph of order n, Y n , and if n is even, the 
exclusivity graph is the Mobius ladder of order 2 n, M 2n ■ These graphs are shown in 
figure |2.2| If n is odd, 

d{Y n ) ^ 2ncos(f) 

u{Y n ) (l + cos(^))(n-2)’ 

and for n even 

0(M 2w ) ^ 2n(l + cos (f)) 
a{M 2n ) n- 2 
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The quantum maximum can be obtained in a system of dimension three for n odd, and 
four for n even |AQB + 13 . In this case, the quantum maximum approaches the classical 


maximum as the number of vertices n grows. 


Something similar happens for the inequalities shown in example 23 when the n-cycle 
is used as exclusivity graph, with n odd. In this case 


#(C n ) = 2ncos(f) 
a(C n ) (l + cos(^))(n-2)’ 


and the quantum maximum also approaches the classical bound. 

In both cases, the differences between classical and quantum distributions become 
smaller when n grows. We want to find families of graphs with the opposite behavior. 
We seek for situations in which the ratio grows as fast as possible. 

First we notice that if we fix a(G) < k, there is a limit for the ratio . 

Theorem 30. For every k e l\l there exists an absolute constant such that for any 
graph G on v vertices with a(G) < k, -9(G) < Mi-v ] ~ 2lk . 

The result above is Theorem 5.1 of reference [AK98] , It generalizes the result of 
|KK83] for k = 3, for which M 3 = 25. Although there is no explicit constructions for 
general k, in |Alo94] the author shows a family of graphs with a - 2 approaching | = v 113 . 
The graphs depend on a parameter r that can not be a multiple of 3. The number of 
vertices is 2 3k . For r = 2 it is a graph with 64 vertices and its complement is a graph 
formed by 16 unconnected squares. In this case 9 = a and it does not exhibit quantum 
violation. We have computed the adjacency matrix for the complement of the graph 
we want for r = 4. It has over 2 million edges. We don’t know if for r = 4,5 the 
corresponding inequalities have quantum violation. For r > 6 we have 9 > a. These 
graphs are Cayley graphs and, as a consequence, regular and vertex-transitive. 

If we do not fix the noncontextual bound we can obtain larger violations with simpler 
graphs, for which the number of vertices does not grow so fast. 

Theorem 31. For every e > 0 there is an explicit family of graphs for which 9 > (| - c) v 
and a < v s ^ E \ 5{e) < 1. 


This is Theorem 6.1 in [ AK98 ]. For a pair of integers q > s > 0, G{q, s ) will be the 


graph on v = 


2 q 

q 


vertices, each vertex corresponding to a ^-subset of {1,2,...,2 q}. 


Two vertices are adjacent iff their intersection has exactly s elements. For small values 
of q and s we have: 
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5 

a 

9 

2 
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4 

1 

17 

23 

4 

2 

10 

10 

4 

3 

14 

14 

5 

1 

>55 

94,5 

5 

2 

>27 

42 

5 

3 

> 12 

18,67 

5 

4 

>28 

42 


For this family, the authors provide an orthonormal representation that achieves the 
lower bound on 9 in dimension 2 q. This orthonormal representation provides a state 
and measurements that we can use to achieve this amount of violation. 

Although these are the best explicit constructions, it is already known that they do 
not reach the maximum violation f as a function of the number of vertices in the graph 
|Fei95| . 


,1-C 


Theorem 32. For every e > 0 there is a graph G on v vertices such that > 
Theorem 33. There exists an infinite family of graphs on v vertices for which > 


2C V /l°g(‘') 

Although the results above prove the existence of families with larger ratio then the 
ones considered above, its proofs are based on the probabilistic method and there is no 
explicit construction approaching these lower bounds [ASQ4j . It is also not known if 
these bounds are tight. 

It is interesting to notice that the large growth of the ratio | was bad news for 
research in graph theory. While -9(G) is easy to compute, other quantities such as the 
independence number and the Shannon capacity of the graph are hard to calculate in 
general and both are upper bounded by 9(G) |Lov79] , A large growth of | shows that 
the bound for a is far from being tight, and hence this number can not be used in general 
as a good approximation to the independence number. 

As the study of these families may help us to understand how quantum distributions 
can go beyond the noncontextual ones, we believe that there may be some practical appli¬ 
cations to high violations of noncontextuality inequalities. As an example, we conjecture 
that there may be a connection between these large violations and the certification of 
randomness in the data obtained in the experiments [PAM + 10l IUZZ + 13| . 


2.12 Final Remarks 

In this chapter we have discussed a way of proving the impossibility of noncontextual 
hidden-variable models. The set of noncontextual distributions is a polytope and hence 
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can be described by a finite set of linear inequalities, violated by some quantum distribu¬ 
tions, which proves that the quantum statistics can not be reproduced by these models 
in all situations. 

The first approach to noncontextuality we have discussed is through the compatibility 
graph, which coincides with the usual approach to quantum contextuality (as can be seen 
in appendix [A]). In this case, an experimentalist is given a set of possible measurements 
to perform in a physical system, and the compatibility structure of this set is encoded in 
the compatibility graph of the scenario. The probability distributions for each context 
are collected to form an empirical model with the no-disturbance property. The set of 
noncontextual distributions is a polytope and the quantum set is in general larger, as 
proven by the fact that some quantum distributions do not satisfy all noncontextuality 
inequalities in the H-description of the noncontextual set. 

The mathematical formalism of this scenario can be translated into a sheaf-theoretic 
language, which provides a characterization of the phenomena of contextuality in terms 
of obstructions to the existence of global sections in a presheaf, which opens the door 
to the use of the methods of sheaf theory to the study of contextuality. 

When all coefficients of the inequality are equal to one, the local and quantum 
bounds for a noncontextuality inequality can be found with the help of another graph, 
the exclusivity graph of the inequality. The classical bound is equal to the independence 
number of the exclusivity graph and the quantum bound is upper bounded by the Lovasz 
number of this graph. Many important inequalities can be written in this form, including 
the n-c ycle inequalities of example 20 The weighted versions of these graph functions 
can be used to calculate the classical and quantum bound when the coefficients are not 
all equal to one, but we will not consider this case here. We refer to [ CSW14 . Knu94 
for more details. 

Another perspective to contextuality is given by the exclusivity graph approach. We 
start with the exclusivity graph G, where each vertex i represents an event, a transfor¬ 
mation PjCST in a probabilistic model. If ( i,j)eE{G ) the events i and j are exclusive, 
that is, there is a measurement among whose outcomes are Pj and Pj. The main dif¬ 
ference between this approach and the compatibility graph approach is that in this case 
we make no restriction in the compatibility scenario leading to the exclusivity structure 
of the events. 

In this new perspective, the noncontextuality assumption means that the value associ¬ 
ated to a projector P by the hidden-variable model is independent of the other projectors 
used to define the measurement. As we have seen, the same transformation corresponds 
to an outcome of several different measurements. Then, whenever P corresponds to 
an outcome of different measurements Mi,M 2 ,...,M n , a noncontextual hidden-variable 
model assigns the outcome corresponding to P to some M, if and only if it does for all 
other Mj. 

The set of noncontextual distributions is once more a polytope, contained in the set 
of quantum distributions which is generally larger. It can be described by a finite set of 
noncontextuality inequalities, violated by quantum distributions in many situations. 

When all coefficients of the inequality are equal to one, the local, quantum and gener- 
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alized bounds for the noncontextuality inequality can be found using only the exclusivity 
graph of the inequality. The classical bound is equal to the independence number of the 
exclusivity graph and the quantum bound is equal to the Lovasz number of this graph. 
In this case we have an equality between the quantum bound and the Lovasz number 
because we do not have extra restrictions imposed by a specific compatibility structure. 

The most general distributions we consider have to satisfy the Exclusivity principle, 
and for this kind of distribution the bound is equal to the fractional packing number of 
the exclusivity graph. This principle will be used later on in chapter [3] in our attempt to 
understand why quantum theory is not more rioncontextual then it is. 

Many important inequalities can be written in this form, including the n-cycle in¬ 
equalities of example 23 Once more, the weighted versions of these graph functions can 
be used to calculate the bounds when the coefficients are not all equal to one, but we 
will also not consider this case here. We refer to |CSW14 . Knu94] for more details. 

We believe that besides the importance for the foundations of quantum theory, large 
violations of noncontextuality inequalities may have practical applications such as ampli¬ 
fication of randomness. We have presented the known results about the growth of the 
ratio seeking for the families of graphs for which this ratio grows as fast as possible. 
Unfortunately, many of the known results are based on the probabilistic method and there 
is no explicit construction of the graphs or the explicit construction is so complicated 
that it makes any experimental implementation impossible. 
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«/» Third Chapter e^> 

What explains the Lovasz bound? 


If the truth be told, few physicists 
have ever really felt comfortable 
with quantum theory. 

Philip Ball, |Ball3] 


The mathematical formulation of quantum theory is almost one century old and dur¬ 
ing this time a number of brilliant scientists around the world have built a quite good 
knowledge about it, both on the theoretical aspects and experimental control of quantum 
systems. “Physicists are capable of making stunningly accurate calculations about molec¬ 
ular structure, high-energy particle collisions, semiconductor behavior, spectral emissions 
and much more” [Ball3] , They learned how to manipulate quantum systems for informa¬ 
tion processing. They know a lot about the structure of matter and how to use it for our 
purposes. This certainly has a great impact on the development of current technology. 

From the practical point of view we may say that physicist have a good relationship 
with quantum theory. But, just as Einstein, Podolsky and Rosen in 1935, you can get 
in serious trouble when you try to understand the meaning of the mathematical objects, 
specially if you try to apply the reasoning of classical physics we are used to. 

This situation led many people to adopt the way of thinking known as Copenhagen 
interpretation. According to this line of thought, the weirdness of quantum theory reflects 
fundamental limits on what can be known about nature and we just have to accept it. 
Quantum theory should not be understood but seen just as a tool to get practical results. 
As famously phrased by David Mermin, physicist should “shut up and calculate”[Mer89]. 

Not everyone is happy with this interpretation, including Mermim himself [Merl4 j. 
Physics is not just about getting practical results, it is also about understanding how 
nature behaves. Since the EPR vs Bohr debate, many have tried to understand (or 
question, like EPR) the abstract formulation of quantum theory from more compelling 
physical arguments. This is one of the most seductive scientific challenges in recent 
times: deriving quantum theory from simple physical principles. 

The starting point is assuming general probabilistic theories allowing for probability 
distributions that are more general than those that arise in quantum theory, and the goal 
is to find principles that pick out quantum theory from this landscape of possible theories. 
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There are diverse ideas on how to do this, and at least three different approaches to the 
problem stand out. 

The first one consists of reconstructing quantum theory as a purely operational 
probabilistic theory that follows from some sets of axioms. The idea is to demolish 
the abstract entities and start again. Imposing a small number of reasonable physical 
principles, they manage to prove that the only consistent probabilistic theory is quantum. 
Although really successful, this approach does not resolves the issue completely, specially 
because some of the principles imposed do not sound so natural. This “unsatisfaction” 
is very well phased by Chris Fuchs [Fuel 1 ]: 

There is no doubt that this is invaluable work, particularly for our under¬ 
standing of the intricate connections between so many quantum information 
protocols. But to me, it seems to miss the mark for an ultimate understand¬ 
ing of quantum theory; I am left hungry. I still want to know what strange 
property of matter forces this formalism upon our information accounting. 

I would like to see an axiomatic system that goes for the weirdest part of 
quantum theory. 


The second approach to the problem goes in this direction. Instead of trying to 
reconstruct quantum theory, the idea is to understand what physical principles explain 
one of the weirdest part of quantum theory: nonlocality. Many different principles have 
been proposed, which we left for Appendix [C] 

The third approach consists of identifying principles that explain the set of quantum 
contextual correlations without restrictions imposed by a specific experimental scenario. 
The belief that identifying the physical principle responsible for quantum contextuality 
can be more successful than previous approaches is based on two observations. On one 
hand, when focusing on quantum contextuality we are just considering a natural extension 
of quantum nonlocality which is free of certain restrictions (composite systems, space-like 
separated tests with multiple observers, entangled states) which play no role in the rules 
of quantum theory, although they are crucial for many important applications, specially 
in communication protocols (see, for example, references [Wiki 1HHHH091 lBBC + 93] 
and other references therein), and played an important role in the historical debate on 
whether or not quantum theory is a complete theory. 

On the other hand, it is based on the observation that, while calculating the maximum 
value of quantum correlations for nonlocality scenarios is a mathematically complex 
problem (see [ PV10 ] to see how complex is to get the quantum maximum for a simple 
inequality like 13 , 3 , 2 , 2 ), calculating the maximum contextual value of quantum correlations 
for an arbitrary scenario characterized by its exclusivity graph is simple: as we proved 
in section |2T0] the maximum quantum contextuality is given by the Lovasz number of 
its exclusivity graph, which is the solution of a semidefinite program [Lov95] , Indeed, 
from the graph approach perspective, the difficulties in characterizing quantum nonlocal 
correlations are due to the mathematical difficulties associated to the extra constraints 
resulting form enforcing a particular labeling of the events of a exclusivity structure in 
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terms of parties, local settings, and outcomes |5BBC13| . rather than a fundamental 
difficulty related to the principles of quantum theory. 

Within this line of research, the most promising candidate for being the fundamental 
principle of quantum contextuality is the Exclusivity principle, which can be stated as 
follows (see principle [I]): 

The sum of the probabilities of a set of pairwise exclusive events cannot exceed 1. 


The Exclusivity principle was suggested by the works of Specker |Spe60] and Wright 
[Wri78] and used in [ CSWTO ] as an upper bound for quantum contextuality. However, its 
fundamental importance for QM was conjectured long before |Spe09|. It was promoted 
to a possible fundamental principle by the observation that it explains the maximum 
quantum violation of the simplest noncontextuality inequality, as we will see in section 



However, with this extra restriction, the Exclusivity principle cannot single out some 
quantum nonlocal correlations [F5A + 13] , 

By itself, the Exclusivity principle singles out the maximum quantum value for some 


Bell and noncontextuality inequalities |Cabl3b| . According to the results of section 3.1 
this happens whenever -9(G) = a*(G). We can get better bounds if we apply the E prin¬ 
ciple to more sophisticated scenarios. When applied to the OR product of two copies of 
the exclusivity graph, which physically may be seen as two independent realizations of 
the same experiment, the Exclusivity principle singles out the maximum quantum value 
for experiments whose exclusivity graphs are vertex-transitive and self-complementary 
Cabl3b] , which include the simplest noncontextuality inequality, namely the KCBS in¬ 


equality presented in example 23 Moreover, either applied to two copies of the exclusivity 
graph of the CHSH inequality or of a simpler inequality, the Exclusivity principle excludes 
the so called PR boxes and provides an upper bound to the maximum violation of the 
CHSH inequality which is close to the Tsirelson bound [FSA + 13l ICabl3b] (see appendix 
[C]). In addition, when applied to the OR product of an infinite number of copies, there is 
strong evidence that the Exclusivity principle singles out the maximum quantum violation 
of the noncontextuality inequalities whose exclusivity graph is the complement of odd 
cycles on n > 7 vertices [ CDLP13 J. Indeed, it might be also the case that, when applied 
to an infinite number of copies, the Exclusivity principle singles out the Tsirelson bound 
of the CHSH inequality |FSA + 13l !Cabl3b| . 

Another evidence of the strength of the Exclusivity principle was recently found by 
Yan [ Yan 13] . By exploiting Lemma 1 in [ Lov79 ], Yan has proven that, if all correlations 
predicted by quantum theory for an experiment with exclusivity graph G are reachable in 
nature, then the Exclusivity principle singles out the maximum value of the correlations 
produced by an experiment whose exclusivity graph is the complement of G, denoted as 
G. 

We recently proved three stronger consequences of the E principle |ATC14 ]. The 


Exclusivity principle singles out the entire set of quantum correlations associated to 
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3. What explains the Lovasz bound? 


any exclusivity graph assuming the set of quantum correlations for the complementary 
graph. Moreover, for self-complementary graphs, the Exclusivity principle, by itself ( i.e., 
without further assumptions), excludes any set of correlations strictly larger than the 
quantum set. Finally, for vertex-transitive graphs, the Exclusivity principle singles out the 
maximum value for the quantum correlations assuming only the quantum maximum for 
the complementary graph. These results show that the Exclusivity principle goes beyond 
any other proposed principle towards the objective of singling out quantum correlations. 

In this chapter we will prove all these results in detail. In section [3T we review the 
noncontextuality inequalities under consideration, the definition of the exclusivity princi¬ 
ple and other important concepts. In section 3/2 we explain how the principle applied to 
two copies of the pentagon singles out the quantum maximum for this graph. In section 


3.3 we show how the principle can be used to connect the set of quantum correlations 


for G and G, and how this connection is sufficient for ruling out any distribution out¬ 
side the quantum set in many important cases. In 3A_ we show that something similar 
can be done with graph operations other then complementation and as a consequence 
we prove that the exclusivity principle explains the quantum maximum for all vertex 


transitive graphs with 10 vertices, except two. We end with our final remarks in 3.5 
Consequences of the E principle under Bell-scenario restrictions are outside the scope of 
the present thesis (and chapter), but a small introduction can be found in section C.6 


3.1 The Exclusivity Principle 

First, let us briefly review some of the definitions and concepts introduced in section 


to a transformation Tj eST in a physical system and two vertices are connected by an 
edge if they are exclusive, that is, if they can be two different outcomes of the same 
measurement. For a given state of the system, there is a probability pt associated to 
each vertex i e V. We collect all these probabilities in a vector p e The set of 
possible vectors depends on the physical theory used to describe the system and we will 
see how the Exclusivity principle (principle [lj constrains this set. 

Could this principle be the reason for quantum theory not be more noncontextual? 
Can it explain the quantum maximum for noncontextuality inequalities? It is not clear 
what happens in general, but for a special class of inequalities (or graphs) many results 
supporting a positive answer have been found. We will apply the E principle for sums of 
the type 


2.10 We start with an exclusivity graph G= [V,E). Each vertex i of G corresponds 


Sg = E Pi’ 

ieV 


(3.1) 


that is, we set y ; - = 1 for all i in definition 58 For non-contextual distributions we know 
that 


NC 

S G < a(G), 


(3.2) 
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3.2. The Pentagon 


while for quantum distributions we have 

Sg < -0(G), (3.3) 

where 0(G) is the Lovasz number of G. 

The first question is if the Exclusivity principle is capable of explaining the quantum 
bound -0(G). For many different cases, a lot of them with special importance for the 
study of contextuality, this is indeed the case. A much more ambitious question is if 
this principle is enough to single out the set of quantum distributions and not just the 
quantum maximum. Again, we are able to exhibit a important family of graphs for which 
this is true. 


3.2 The Pentagon 


The Exclusivity principle singles out the quantum maximum for the simplest noncontex- 
tuality inequality. 

Theorem 34 (Cabello, 2013). For G = C 5 , the maximum value for Sc allowed by 
theories satisfying the Exclusivity principle is vT which is also the maximum for quantum 
distributions. 


Proof. Let {ed and {e!} be two sets of 5 events with exclusivity graph G as shown 
figure 3.1 such that et and e\ are independent. 


in 




Figure 3.1: Exclusivity graphs of the sets of events e ; - and e'.. 


Define the event /) = e,-Ae|. which is true if and only if both and e'. are true. Note 
that the exclusivity graph of the events {/*} is the complete graph on 5 vertices because 
{/,} is a set of pairwise mutually exclusive events. 

Since gj and e'. are independent pifi) = p{ei)p{e'.). Using the Exclusivity principle 
we have 

J^pifi) = £p(e;)p(e-) < 1. 

i i 

Using the symmetry of the pentagon, we can assume (see lemma [I] below) that the 
maximum is reached when all the probabilities are the same, that is 

plei) = = P, V ieV 


85 






3. What explains the Lovasz bound? 


Hence we have 

X> 2 = 5P 2 <1 
i 

which implies that 



Now, if we substitute this value into equation ( |3.1| l for S G we have 


S G = X><^5. 

i 


□ 


3.3 The exclusivity principle forbids sets of 
correlations larger than the quantum set 


The idea used in the previous section to derive the quantum bound for the pentagon 
using the Exclusivity principle can be applied to show the there is a connection between 
the set of quantum distributions for G and G. Yan first used it in reference [Yanl3j . 
where he proves the following: 

Theorem 35 (Yan, 2013). Given the set of quantum distributions for G, the E principle 
singles out the quantum maximum for G. 

Proof. Let {e ; -} be a set of n events with exclusivity graph G and {ei} be a set of n 
events with exclusivity graph G, such that and e'. are independent. Define the event 
fi = e, Ae' which is true if and only if both et and e\ are true. Note that the exclusivity 
graph of the events {/)} is the complete graph on n vertices because {/)} is a set of 
pairwise mutually exclusive events. The Exclusivity principle implies that 

T.p^ = J1p^p^ - L 

i i 


Suppose that the distribution piei) is given by 

p(e-) = 1(^1 Vi) I 2 - 

Then 


(3.4) 


and hence 


which implies that 


i ^EK/io = Y,P( e i'>P( e 'i') = E^ e iJ|(V f l ^>1 - 


£/*(e,0 min \(y/\ v t )\ 2 <Y^p{eO\(xp \ Vi)\ 2 < l 


Y,P(ei) <max 


\(V\Vi)\ 


2 ' 
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3.3. The exclusivity principle forbids sets of correlations larger than the quantum set 


This inequality should hold for any normalized |ig) and any orthogonal representation 
{|vj)}, and hence 


y p{ei)< min max- 

i \v)M i \(y/\ Vi)\ 


2 ' 


The right-hand side is just the Lovasz number of G (see [ Lov79l IKnu94j| ). Hence, we 
conclude that if all quantum distributions given by equation ((374) can be reached and if 
the Exclusivity principle holds, the maximum value of Sg can not exceed the quantum 
bound. □ 


Let us show that making the same assumptions of the previous theorem, it is pos¬ 
sible not only to single out the quantum maximum but also the entire set of quantum 
correlations. 


Proposition 1 (Amaral, Terra Cunha, Cabello, 2014). Given the quantum set <Sq{G), 
the Exclusivity principle singles out the quantum set 8q{G). 

Proof. Let {e ; -} be a set of n events with exclusivity graph G and {/)} be a set of n events 
with exclusivity graph G, such that and /) are independent. Define the event g, which 
is true if and only if both and /) are true, g* = e, A/,■. Note that the exclusivity graph 
of the events {g,} is the complete graph on n vertices because {g,} is a set of pairwise 
mutually exclusive events. 

Since ei and /) are independent p{gi) = PiPi, where Pi = pled and Pi - p{fi)- 
Using the Exclusivity principle we have 


ypiPi <i. 0.5) 

i 

Now we use corollary 3.4 and theorem 3.5 in reference [GLS86_: 

Theorem 36. The set TH[G) can be written in the following ways: 

TH{G) =jfe R n ;Pi > 0 ,f)(G,P) < l|, (3.6) 

where 

and 


P{G,P) = max\y PiPpPeTHiG) 


\, 


(3.7) 


TH(G) = jpc IR”; Pi = \ {xf/ \ y,-)| 2 ,( ig | ig) = l,{|yj)} orthonormal representation for G}|. 

(3.8) 

Equation l |3.6| implies that, for a given P, equation l |3.5| will be satisfied for all P' 
if and only if P belongs to TH{G). Equation ( |3.8| shows that TH{G) =<§q{G). Then 
we conclude that if the set of allowed distributions for G is TH{G) = <§q{G), theorem 


Sq(G). □ 


36 implies that the distributions in G allowed by the Exclusivity principle belong to 
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3. What explains the Lovasz bound? 


Physically, the proof above can be interpreted as follows: assuming that nature 
allows all quantum distributions for G, the Exclusivity principle singles out the quantum 
distributions for G. 

Proposition [l] does not imply that the Exclusivity principle, by itself, singles out the 
quantum correlations for G, since we have assumed quantum theory for G. Nonethe¬ 
less, it is remarkable that the Exclusivity principle connects the correlations of two, a 
priori, completely different experiments on two completely different quantum systems. 
For example, if G is the n-cycle C n with n odd, the tests of the maximum quantum 
violation of the corresponding noncontextuality inequalities require systems of dimension 
3 [CSWIOIICDLP13IILSW11I |AQB + 13| . However, the tests of the maximum quantum 
violation of the noncontextuality inequalities with exclusivity graph C n require systems 
of dimension that grows with n |CDLP13| . Similarly, while two qubits are enough for a 
test of the maximum quantum violation of the CHSH inequality (see appendix [B]), the 
complementary test is a noncontextuality inequality (not a Bell inequality) that requires 
a system of, at least, dimension 5 [Cabl3a] , 

An important consequence of proposition [l] is that the larger the quantum set of 
G, the smaller the quantum set for G, since each probability allowed for G becomes a 
restriction on the possible probabilities for G. Such duality gets stronger when G is a 
self-complementary graph. 


Proposition 2 (Amaral, Terra Cunha, Cabello, 2014). If G is a self-complementary 
graph, the Exclusivity principle, by itself, excludes any set of probability distributions 
strictly larger than the quantum set. 


Proof. Let X be a set of distributions containing <Sq[G ) and let P e X\S’q(G). By 
Theorem 1, there is at least one Pg<§q G such that 


£ PiPi > 1, (3.9) 

ieV(G) 


which is in contradiction with the Exclusivity principle. Since G is self-complementary, 
after a permutation on the entries given by the isomorphism between G and G, P becomes 
an element of <§q{G) and hence P and P belong to X. Expression ( |3.9| l implies that this 
set is not allowed by the Exclusivity principle. □ 


The fact that the Exclusivity principle is sufficient for pinning down the quantum 
correlations as the maximal set of correlations for any self-complementary graph, given 
that the entire quantum set is possible, means that the Exclusivity principle is able to 
single out the quantum correlations for a large number of nonequivalent noncontextuality 
inequalities, including the KCBS one. In contrast, neither information causality, nor 
macroscopic locality, nor local orthogonality have been able to single out the set of 
quantum correlations in any Bell inequality. 

The hypothesis in theorem 35 can be weakened for vertex transitive graphs. In¬ 
stead of assuming the entire set of quantum correlations for G, the same result can be 
proven, given only the quantum maximum for G. The exclusivity graphs of many inter- 
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3.3. The exclusivity principle forbids sets of correlations larger than the quantum set 


esting inequalities including CHSH [ CH5H69 ]. KCBS [ KCBS 08J. the n-cycle inequalities 
[CSW10IICDLP13IILSW11I [AQB + 13l , and the antihole inequalities |CDLP13| are vertex 
transitive. A graph is vertex transitive if for any pair u,v e V{G ) there is <pe Aut(G) such 
that v = <p{u ), where Aut(G) is the group of automorphisms of G (/.e., the permutations 
if/ of the set of vertices such that u, v e V{G ) are adjacent if and only if xf/(u),y/{v) are 
adjacent). 


Proposition 3 (Amaral, Terra Cunha, Cabello, 2014). IfG is a vertex-transitive graph 
on n vertices, given the quantum maximum for G, the Exclusivity principle singles out 
the quantum maximum for G. 


A sequence of three lemmas proves the result. First we prove that the quantum 
maximum for S is assumed at a symmetric configuration. Then we prove that the 
product of the quantum maxima for G and G is bounded from above by the number of 
vertices of G, and the same from below. 


Lemma 1. If G is a vertex-transitive graph, then the quantum maximum for S = Pi 
is attained at the constant distribution Pi = p ma x- 

Proof. Let P = (p(ei),p(e 2 ),..., p(e n )) be a distribution reaching the maximum. Given 
an automorphism of G, <p e Aut(G), consider the distribution P^ defined as pAef) = 
This is a valid quantum distribution, also reaching the maximum for S. Define 
the distribution 

Q = - E P d’ o.io) 

(pE Aut(G) 

where A = #Aut(G). This distribution also reaches the maximum for S. Since G is vertex 
transitive, given any two vertices of G, and ej, there is an automorphism y/ such that 
i f/{ei) = ej. Then, 


q{ej) = q{y/{ei)) 

1 V- 

= -r L Pttytei)) 

(pE A Ut(G) 

= 4 E p[(p°^i)) 

0eAut(G) 

1 v- 

= T E 

^'eAut(G) 

= qie t ). (3.11) 


□ 


Lemma 2. IfG is a vertex-transitive graph on n vertices, then the Exclusivity principle 
implies that the quantum maxima for S{G ) and for S(G) obey 


M q (G)Mq(g) 


E 

< n. 


(3.12) 
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3. What explains the Lovasz bound? 


Proof. Lemma 
Inequality ( |3.5| 


applies for both, G and G, giving np max = Mq(G ) and np max = Mq 
Tor these extremal distributions reads 



which proves the result. 


E 

Pmax Pmax — 1 > 


(3.13) 

□ 


Lemma 3. If G is a vertex-transitive graph on n vertices, then 

Mq(G)M q (g)> n. (3.14) 

Proof. When we recall that the graph approach identify the quantum maximum with 
the Lovasz number, as proven in theorem [29} we have that 

€>(G) = M q (G), 

d(G) = M q (g), (3.15a) 

and since for vertex-transitive graphs #(G) #(G) > n (Lemma 23 in reference |Knu94j ). 

the lemma follows. □ 


Proposition |3]opens the door to experimentally discard higher-than-quantum correla¬ 
tions. Specifically, lemma[2]implies that we can test if the maximum value of correlations 
with exclusivity graph G goes beyond its quantum maximum without violating the Ex¬ 
clusivity principle by performing an independent experiment testing correlations with 
exclusivity graph G and experimentally reaching its quantum maximum [Cabl3a]. A 
violation of the quantum bound for G in any laboratory would imply the impossibility of 
reaching the quantum maximum for G in any other laboratory. 


3.4 Other graph operations 

We have seen in the previous section that using the operation of complementation and 
the Exclusivity principle, we are able to explain the quantum bound and the quantum 
set of distributions for many different noncontextuality inequalities. In a joint work with 
Adan Cabello, we study if something similar is possible using other graph operations. 


3.4.1 Direct cosum of G' and G" 


Definition 60. Given two graphs G' and G" we define the direct cosum G of G' and 
G" as the graph with L(G) = L(G')u U(G") and such that ( u, v ) e E(G ) iff ( u, u ) e E{G'), 
or [u, v) e E(G"), or ueV{G') and veV(G"). 


This operation applied to two copies of C 5 is illustratecQ in figure 3.2 


1 h’or C 5 , this operation is equivalent to applying the duplication defined is subsection 


3 . 4.2 


and 


complementation, but this is not true in general. For general graphs G' and G", G-G' + G", where the 
direct sum of graphs is defined by the disjoint union of vertices and edges. 
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3.4. Other graph operations 


The result below is a well-known fact and can be found on reference |Knu94] , but 
we repeat it here to reinforce the connections with quantum theory. 



Figure 3.2: Two copies of the pentagon (a) and their direct cosum (b), the circulant 
graph Cio(l,2,3,5). In (b), one of the copies is colored in red, the other copy in green 
and the edges connecting the vertices of one copy to the other are gray. 


Lemma 4. d(G) =max{d(G'),II(G")}. 

Proof. Let {| ry)} be an orthonormal representation for G and \y/) be a unit vector in the 
same vector space. Every vertex of G' is exclusive to all vertices of G" , which means that 
the vectors of the orthonormal representation for G generate a subspace V' orthogonal 
to the subspace V" generated by the vectors of the orthonormal representation for G". 
Because of this, we can decompose |i p) as a sum of two orthogonal vectors: 


|i/r> = a\y/') + b\if/”), W)eV', \yr")eV", |a| 2 + |fc| 2 = l. 

With these definitions we have 


EK^i^)i 2 = i«i 2 IK/ki)i 2 +i^i 2 


ieG 


\ieG' 


\ieG" 


and then 

d(G) <max{€/(G'),'0(G")}. 

Suppose max{-0(G'),'0(G")} = d(G'). Let be a Lovasz optimal representation 

for G' and \\p) the unit vector achieving d{G'). Let {|zV , )| be any Lovaz representation 
for G". The set of vectors {| v'-) ©0,0© | v")} is an optimal Lovasz representation for G 
and the unit vector |i/r )©0 achieves the upper bound. 

□ 


Corollary 10. If the E principle rules out violations above quantum maximum for G, 
it also rules out violations above the quantum maximum for its direct cosum with any 
other graph H such that d{IT) < d{G). In particular, it rules out violations above the 
quantum maximum for the direct cosum of G with itself. 
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3. What explains the Lovasz bound? 


3.4.2 Twinning, partial twinning and duplication 

We can also consider graphs obtained from two copies of G by adding some of the edges 
between the vertices of each copy of G but not all of them. One of this graphs is the 
graph T[G) obtained if we consider two copies of G with the same labeling and join the 


shows this operation applied to the pentagon. We call this operation twinning, since the 
graph associated to T(G) is the one obtained by twinning all the vertices of G. 

Theorem 37. ti[T{G)]=2ti{G). 


vertices of one of the copies with the exclusive vertices of the other copy. Figure 3.3 


Proof. The upper bound d[T{G)] < 2 d(G) comes from the fact that the each copy of G 
is an induced subgraph of T(G) and this implies that every orthonormal representation 
for the twinning includes an orthonormal representation for each copy of G. Equality 
is reached since given an optimal orthonormal representation \y/), {|t '/>}” =1 for G, the 
vectors {\Vi)} 2 P Y with \v{) = \Vj +n ) form an optimal orthonormal representation for 


The same holds true for any graph obtained from £(G) by removing edges between 
the two copies of G. We call these graphs partial twinnings of G. This follows from the 
lemma below. 


Lemma 5 (The second sandwich lemma). If G\ = {V,E\) and G 2 = (V,£ 2 ), with £ 2 <= £1 
and #(Gi) = #(G 2 ) = d, then, for any G' = (V,E) such that £ 2 c £c £ 1; #(G') = d. 

Proof. Let be an optimal orthogonal representation for G\. It is also an 

orthogonal representation for G', wich implies that d{G') > d. Let | y/), {| vf)} be an 
optimal orthogonal representation for G'. It is also an orthonormal representation for 
G 2 , which implies that d > d{G'). 

□ 


Corollary 11. If G' is a partial twinning of G then d{G') =2d(G). 


Proof. We apply the second sandwich lemma [5] with Gi = £(G) and G 2 the graph ob¬ 
tained by disjoint union of two copies of G. □ 


Figure [53] (a) shows the twinning of C 5 . Partial twinnings of C 5 can be obtained 
by removing any of the ten edges present in figure 3.3 (a) and absent in figure 3.3 (c). 
Figure |3.3| (b) is just a particular case of this. 

From theorem 37 and corollary [TT] we have: 


Corollary 12. If the Exclusivity principle singles out the quantum maximum for a graph 
G, it also singles out the quantum maximum for its twinning and all its partial twinnings. 


The extreme case of partial twinning presented in figure [53] (c) is also called the 
direct sum of G with itself (Knu94j . We call this operation duplicatioi^ of G. We can 

2 Although the term duplication is sometimes used to refer to a different graph operation than the 
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3.4. Other graph operations 


apply this same operation on two different graphs G' and G", obtaining a graph G with 
v[G') + v{G") vertices and such that u ~ v in G if and only if either u ~ v in G' or 
u~ v in G". Clearly -9(G) = 9(G') + 9(G"), and we also have the trivial result that if the 
Exclusivity principle singles out the quantum maximum for G' and G" it also singles out 
the quantum maximum for G. 


(a) i (b) i (c) 





Figure 3.3: (a) The twinning of C5, the circulant graph Czio(2,3). (b) A partial twin¬ 
ning of C5, the circulant graph Cz'io(2,5). (c) The duplication of C5, the circulant graph 
Cz‘io(2). 


3.4.3 Vertex-transitive graphs obtained from C 5 


Applying the operations above to C5, for which the Exclusivity principle singles out the 
quantum maximum, and using the results from previous sections we can explain the 
quantum maximum for almost all vertex-transitive graphs with 10 vertices. 

Among the vertex-transitive graphs on 10 vertices, only eight have 9(G) > a(G), the 
circulant graphs Cz'io(l,2,3,5), Cz'io(l>4), Czi 0 (2,5), Czi 0 (2,3,5), Czi 0 (l,2,3), Czi 0 (l,2), 
and Czio(1>2,5) and the Johnson graph 7(5,2) fWikbl [Wikd], 


Proposition 4 (Amaral and Cabello). The quantum maximum for the graphs 7(5,2), 
Cz‘io(1,2,3,5), Cz‘io(T4J, Cz‘io(2,5), Cz‘io(2,3,5) and Cz‘io(1»2,3) is the maximum value 
allowed by the E principle. 


Proof. Since 9(7(5,2)) = a*(7(5,2)), the Exclusivity principle by itself explains the quan¬ 
tum maximum for this graph. The graph Cz'io(1>2,3,5) is the direct cosum of C5 with 
itself, Cz'io(l,4) is the twinning of C5, Cz'io(2,5) is a partial twinning of C5, Cz'io(2,3,5) 
is the complement of Cz'io(l>4J, and Cz'io(l,2,3) is the complement of Cz‘io(2,5). Hence, 
the result follows from proposition [3] and corollaries 10 and |T2j D 


one we define here, we choose this term because its physical interpretation: for exclusivity graph, the 
duplication, as defined above, represents two independent realizations of the same experiment. 
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Duplication 

+ 

complementation 

-► 


Ci,„(1,2,3,5) 



Twinning 


complementation 

-► 


Partial twinning 

-► 


a,,(i.4) 



Ci,„(2,5> 



Twinning 

-► 


Ci.„(2,3,5) 


Partial twinning 
+ 

complementation 

-► 


Ci„(1,2,3) 




Figure 3.4: Vertex transitive graphs of theorem|4j 


3.5 Final Remarks 

In this chapter, we have shown that the Exclusivity principle is able to single out the 
quantum maximum and even the entire set of quantum distributions in many different 
situations. The results found so far are listed below. 


1. The Exclusivity principle directly explains the quantum maximum for al 
with d(G) = a*(G) [CSW10] : 


graphs 


2. Given the set of quantum distributions for G, the Exclusivity principle explains the 
entire set of quantum correlations for G, as shown in proposition [I [ATC14] : 

3. The Exclusivity principle, applied to two copies of the graph, explains the entire set 
of quantum correlations for self-complementary graphs, including the pentagon, 
the simplest graph exhibiting quantum contextuality, as shown in proposition [2] 
|Cabl3bllATTl4] : 
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3.5. Final Remarks 


4. Given the quantum maximum for G, the Exclusivity principle explains the quantum 
maximum for any vertex-transitive graph G, as shown in proposition [3 [ ATC14 J: 


5. The Exclusivity principle explains the quantum maximum for all vertex-transitive 
graphs with 10 vertices, except Ci'io(l,2) and Ci'io(l,2,5), as shown in proposition 

Et 


6 . Either applied to two copies of the exclusivity graph of the CHSH inequality 
[FSA' 13! or of a simpler inequality | Cabl3b . the E principle excludes Popescu- 


Rohrlich nonlocal boxes and provides an upper bound to the maximum violation 
of the CHSH inequality which is close to the Tsirelson bound (see AppendixjC] 


7. The Exclusivity principle rules out all extremal non-quantum distributions in the 
(2,2,d) Bell scenarios [FSA' 13J; 


8 . When applied to the OR product of an infinite number of copies, there is strong 
numerical evidence that the E principle singles out the maximum quantum violation 
of the noncontextuality inequalities whose exclusivity graph is the complement of 
odd cycles on n > 7 vertices [CDLP13] , Indeed, it might be also the case that, 
when applied to an infinite number of copies, the Exclusivity principle singles out 
the Tsirelson bound of the CHSH inequality [ FSA' 13. Cabl3bj . 


The simplest vertex-transitive graphs are shown in figure 3.5 The strengh of the 


Exclusivity principle can be very well exemplified if we analyze what it predicts for those 
graphs. For G = C 5 , the Exclusivity principle explains the entire set of quantum distribu¬ 
tions. For C 7 and C 9 , there are strong numerical evidences that it explains the quantum 
maximu "0 If this is indeed the case, we can also explain the quantum maximum for 
Ci 7 (l, 2 ) = C 7 and Cig(l,2,3) = Cg. It might also be the case that the Exclusivity prin¬ 
ciple explains the quantum maximum for Ci 8 (1,4), the exclusivity graph of the CHSH 
inequality, and if this conjecture is true, it will also explain the quantum maximum for 
Ci 8 (1,2) = Ci 8 (1,4). 


3 A. Cabello, private communication. 
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3. What explains the Lovasz bound? 




Ci 10 (1,2,5) 



Ci 10 (2,3,5) 



Ci 10 (1,2,3) 



Ci 10 (1,2,3,5) 


Figure 3.5: Vertex-transitive graphs with 10 vertices or less 
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To conclude, or not to conclude? 


This thesis is devoted to a mathematical presentation of some results in the quest for a 
principle that explains quantum contextuality. 

The first two chapters are devoted to setting of the ground in which we work. We 
define the generalized probability theories we use to describe a physical system and discuss 
how contextuality arises naturally in this framework. We demand that the Exclusivity 
Principle be satisfied by all distributions. An open question, we would be happy to answer 
soon, is if there is a set of axioms we could impose on these theories that can guarantee 
that the E principle holds and still be compatible with quantum theory. 

The original results of the author and collaborators are the focus of chapter [3j In 
section 3.3 we describe the three main results of reference [ATC14] , Our first result 
shows that the E principle singles out the set of the quantum correlations associated to 
any exclusivity graph assuming the set of quantum correlations for the complementary 
graph. This result goes beyond the one presented by Yan in |Yanl3| . since using the same 
assumptions we have shown that the E principle singles out the entire set of quantum 
correlations and not just its maximum. 

Our second result states that for self-complementary graphs, the E principle, by itself, 
excludes any set of correlations strictly larger than the quantum set.This shows that the 
power of the E principle for singling out quantum correlations goes beyond the power of 
any previously proposed principle. While previous principles cannot rule out the existence 
of sets of distributions strictly larger than the quantum set in any single scenario, our 
results proves that this is indeed the case for many interesting ones, including the famous 
and important KCBS scenario. 

Finally, we have shown that, assuming only the maximum for the complementary 
graph, the E principle singles out the quantum maximum for vertex-transitive graphs. 
This allows experimental tests discarding higher-than-quantum distributions for this kind 
of dual experiment. Interestingly, the CHSH Bell inequality is one of these cases. 

Section [T4 is devoted to unpublished results concerning graph operations other 
than complementation. We use these operations to connect the quantum maximum of 
different graphs. With these connections, once we prove that the E principle singles out 
the quantum maximum for one graph, we are able to conclude that it also does for many 
others. Using this idea with the pentagon we show that the exclusivity principle explains 
the quantum maximum for all vertex-transitive graphs with 10 vertices, except two. If 
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3. What explains the Lovasz bound? 


the E principle explains the quantum bound for one of them, the result of Yan |Yanl3j 
proves that the E principle also explain the quantum bound for the other. 

All these results still do not prove that the E principle is the principle for quantum 
correlations. However, what is clear at this point is that the E principle has a surprising 
and unprecedented power for explaining many puzzling predictions of quantum theory. 

We have many plans for the near future. One of our priorities is to conclude our 
work with the graphs with 10 vertices, explaining the quantum bound for the remaining 
two, a problem that has been puzzling us for a long time. We want to continue our 
search for the families with increasingly large ^ and find connections of this value with 
applications. We believe that there is a connection between this ratio and advantage of 
quantum strategies over classical strategies in a game. The little story of the quantum 
gambler of subsection 2.11.1 is an example, but we would like to find more sophisticated 
situations. We also believe that there may be a connection between this ratio (or some 
other quantifier of contextuality) with amplification of randomness. 

In summary, this thesis closes with some answers, and many questions. 
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<•/» Appendix A e\s 

The impossibility of non-contextual hidden 

variable models 


In this chapter we will present a number of proofs of the impossibility of certain hidden- 
variable models aiming to complete quantum theory. We will show that with some 
very reasonable extra assumptions on these models, we get a contradiction with the 
predictions of quantum theory. 

The first one to present such a proof was von Neumann, and we will discuss his result 
in section A.l Several further developments were made, which culminated with the proof 
of the Bell-Kochen-Specker theorem, which states the impossibility of noncontextual 
hidden-variable models compatible with quantum theory. We give a proof of this theorem 
using a lemma by Gleason in section |7V2 and the Kochen-Specker original proof in section 


A.3 We present other simple proofs in section A.4 A contextual hidden-variable model 


is given in section A.5 


A.1 von Neumann 

Von Neumann was the first to rigorously establish a mathematical formulation for 
quantum theory, published in his 1932 work Mathematische Grundlagen der Quan- 
tenmechanik, and later translated to English in 1955 | vN55 ]. His rigorous approach 
permitted him also to challenge the ideas of completion of quantum theory. 

He derived the quantum formula \2.l) 

(O) = Tr (pO) 

for the expectation value of a measurement from a few general assumptions about the 
expectation-value function. Then, from this formula we can prove that there is no 
dispersion-free state, and hence that hidden-variable models compatible with quantum 
theory are impossible. Although one of his assumptions was wrong, as we explain later, 
his result was a landmark in foundations of physics, since he opened the door for a series 
of papers disproving the impossibility of this kind of completion. 
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A. The impossibility of non-contextual hidden variable models 

A.1.1 von Neumann’s assumptions 

Given a specific type of system in a probability theory, every state defines an expectation- 
value function, according to definition |37| 

{):M —* IK 

where Jt stands for the set of measurements in the model. Instead on focusing on the 
possible states, von Neumann was interested in the properties of these functions, and 
stated a number of requirements he believed were natural impositions on them. 

Definition 61. An expectation value function {)\Ji —* IR is dispersion-free if 

<M 2 > = <M> 2 . (A.l) 

for every measurement M allowed in the model. 

Dispersion-free functions are the ones that come from states in which the values of 
all measurements have definite values, that is, for every M, one of the outcomes has 
probability one. 

Definition 62. An expectation value function ( )\M —* IR is called pure if 

< ) = p( >' + (!- p){ >", 0<p<l, (A.2) 


implies that () = ()' = ( )". 

Pure functions are the ones that can not be written as a convex sum of others and 
( > is pure iff the state that defines it is a pure state of the system. Every dispersion-free 
function is pure, but the converse is not aways true. For example, in quantum theory, 
pure functions are the ones defined by one-dimensional projectors, while there is no 
dispersion-free function. In a hidden-variable model, the two notions coincide. 

In quantum theory, every measurement M is associated to an observable, a hermitian 
operator O acting on the Hilbert space of the system. Von Neumann's first assumption 
is that this correspondence is one-to-one and onto. 


Assumption 13. There is a bijective correspondence between measurements in a quan¬ 
tum system and hermitian operators acting on the Hilbert space of the system. 


This is not always the case, since some systems are subjected to certain superselection 
rules, which forbid some hermitian operators [Wikh] . Although this is not a general 
assumption, there are other formulations of von Neumann’s result that circumvent this 
difficulty (see [ CFS 70J and references therein). 

Suppose a given hidden-variable model is provided that completes quantum theory. 
The states of the system, now given by quantum state plus hidden-variable, define 
expectation-value functions acting on the set of measurements in the system, which is, 
by assumption [l3j the set of hermitian operators acting on the Hilbert space of the 
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A.l. von Neumann 


system 0{Jt f). Then, every state in the theory is associated with a expectation value 
function 

< > : 0 (^°) — 

The next step in von Neumann’s approach was to impose a few assumptions on the 
functions < > that he believed to be valid if these functions came from expectation values 
in a given state of a real physical system. 

Assumption 14. 1. If M is by nature non-negative, (M) > 0; 

2. If measurement Mi is associated to observable Oi and M 2 is associated to observ¬ 
able 0 2 , we can define measurement Mi + M 2 and it is associated to observable 

0i + 0 2 ; 

3. If Mi,M 2i ... are arbitrary measurements 


(aiMi + £J 2 M 2 + ...) = U\ (Mi) + a,2 (M 2 ) +... 
that is, all expectation value functions are linear; 


4. If measurement M is associated to observable O and f:U —► IK is any real func- 
tior[j] the measurement /(M) is associated to observable /(O). 


Theorem 38. Under assumptions 13 and [Mj the expectation value functions in any 
theory completing quantum theory are given by 


(M> = Tr[Op), 


(A.3) 


where O is the observable corresponding to measurement M and p is a density operator 
that depends only on the function ( ) (and not on the particular measurement M ). 


This result implies that, as long as we impose all items of assumption [14| and 
we can not circumvent the quantum rule for expectation values. As we already know, 
the pure functions of this form are the ones for which the associated density operator is 
a one-dimensional projector P and these functions only give dispersion-free expectation 
value for a small number of measurements, namely, the ones for which the subspace in 
which P projects is an eigenspace of the associated observable. This in turn implies that 
there is no dispersion-free function, proving the impossibility of hidden-variable models 
compatible with quantum theory. 

von Neumann’s theorem had the support of many important physicists. For a long 
time, it was generally believed to demonstrate that no deterministic theory reproducing 
the statistical quantum predictions was possible. In 1966, J. Bell published a paper with 


Measurement /(M) is defined using the following rule: measure M and apply / to the out¬ 
come obtained. Observable /(O) can be defined easily if we write O is spectral decomposition. Let 
O — T; «/ \ up (i>j \, where {|i/,-)} is an orthonormal basis for the corresponding vector space. Then 
nO) = Ziftai)\vi)(vi\. 
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A. The impossibility of non-contextual hidden variable models 


some serious criticism to one of the requirements made for the expectation-value func¬ 
tions [ Bel 66 ]. von Neumann required them to be linear, which is the case for quantum 
theory, but there is no physical reason to impose this property for more general theories. 
In fact, as von Neumann point out himself in reference [ vN55j . the sum of measure¬ 
ments < 2 iMi + a 2 M 2 +... is completely meaningless when the measurements involved 
are not compatible, since there is no way of constructing, in general, the corresponding 
experimental set up to implement it. Thus Bell argued that in the case of incompatible 
measurements, it is not reasonable to require that the expectation values necessarily 
reflect the observables’ algebraic relationships. 

It is a special property of quantum theory that the sum of the corresponding observ¬ 
ables corresponds to another allowed measurement (as long as assumption 13 is valid), 
and the fact that the expectation value is linear is a consequence of the mathematical 
rules of quantum theory and is not enforced by any general physical law. In fact, it is not 
difficult to provide a hidden-variable model agreeing with quantum theory for a qubit, 
which does not satisfy linearity of expectation values. 


Example 24 (An example of hidden-variable model). In reference pel 6 6 ]. Bell showed 
an example of a hidden-variable model for a qubit, agreeing with quantum theory but 
violating von Neumann's assumption of linearity. Let A be an operator acting on C 2 . 
Since the Pauli matrices ay and the identity I form a basis to the real vector space of 
4x4 hermitian operators we can always write A in the form 


A — a 0 I + CL\(J x 4 rz 2 o" y + CI 3 & Z) 

where at e [R. 

If we set |a) = [a\, a, 2 , as), the eigenvalues of A, and hence the possible values of 
v[A), can be written as 

v(A) = a 0 ±\\a\\. 

Let | (p) e C 2 and | n) be the point on the Bloch sphere corresponding to \<p). Then, we 
have 

<A> = (<p\ A\(p) = a 0 + (a \ n). 

Together with the quantum state |0), we will use another vector | m) in the Bloch 
sphere to represent the state of the system. This new vector plays the role of hidden 
variable in the model. The complete state of the system is then given by the pair 
(\(f>) which specifies definite outcomes for every projective measurement according 

to the rule: 

| v[A) = a 0 + ||a|| if (|m> + \ri)) • |a> > 0, 

{ v{A) = a 0 -\\a\\ if (|m> + \n)) • |a> < 0, 

in which v{A) is the value assigned to A when the system is in the state (|0), |m>). 

It is not difficult to show that this model gives the quantum statistics when we average 
over the hidden variable | m) using the uniform measure on the sphere S 2 . Indeed, 

J^v(A) d\m) = (A), V 10 )• 
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A.l. von Neumann 


A.1.2 Functionally closed sets and von Neumann’s theorem 

We can conclude from von Neumann’s result that it is not possible to reproduce the 
quantum statistics with hidden-variable models that provide definite outcomes for all 
observables and at the same time give rise to linear expectation-value functions. When 
dealing with hidden-variable models, the assumption that all measurements have well 
defined values is mandatory, and hence we are obligated to give up from the linearity 
assumption. At least from the mathematical point of view, it might be interesting to do 
the opposite |ZC98| . 

Given a quantum state p of a system with Hilbert space J6, we will now try to solve 
the following task: 

Specify an extra variable and a set of observables for which it is possible to assign 
definite values, in such a way that the quantum predictions for p are recovered when 
we average over all possible values of the extra variable. 

von Neumann's result shows that this set can not be the entire set of operators 
acting in J€, if we assume linearity of the expectation-value functions. 

Let D{g) be the set of all definite-valued operators for a state g in some theory, 
where g corresponds to quantum state p and possibly an extra variable. The operators 
one may include in this set depend on what we use as a description of the state of the 
system. For example, if the state is described accordingly only to quantum rules (that 
is, if there is no extra variable), an observable O is in D[g) if and only if the support 
of p is included in one of the eigenspaces of O. If the state of the system is provided 
by a hidden-variable model compatible with quantum theory, D{g) includes all hermitian 
operators acting on What structure can we assume, a priori, for the set D{g)7 

To prove his theorem, von Neumann made two assumptions about this set when the 
states are given in a hidden-variable model: 

1. For every state g in the model, D{g) contains all observables acting on 

2. The value assigned to each measurement reflects the observables algebraic struc¬ 
ture. This is the content of item [3] of assumption [14} 

The criticism made to von Neumann’s result is directed mainly to item number [2] 
Of course, since he was interested in ruling out hidden-variable models, item number 
[I] was mandatory. When we demand both to be true at the same time, we reach a 
contradiction. Bell found a way out von Neumann's impossibility proof by trowing away 
requirement [2] We can do the same giving up of item [l] instead of item [2} 

Definition 63. We say that A is *-closed if any hermitian functior0 of operators in A 
is also in A. 

2 A hermitian function defined in the set of operators acting on a Hilbert space is a map that takes 
hermitian operators to hermitian operators. 
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We will assume from now on that the set D{p ) is ^-closed. 

Definition 64. Let A be a ^-closed set of hermitian operators. A functional valuation 
in A is a map 

< >:A — U 

O — < 0 > 


which satisfies 

lim (F n ) = (F > 

n —*-qg 

whenever the sequence F n converges strongl^jto F. 

This is a much stronger assumption than what von Neumann demands from his 
expectation-value functions. Von Neumann assumed these functions respect linear re¬ 
lationships among the operators, while here we demand that these functions respect 
arbitrary functional relationships among the operators. 

Theorem 39. Let D be a *-closed set of definite-valued operators, d the set of pro¬ 
jectors contained in D and p a density matrix. The following are equivalent: 

1. There is a probability measure p defined in the set of all functional valuations 

( ):D—*U 

such that for all set of compatible operators 0\, ..., O n e D 

p[o\,..., o n \0\,...,O n ) — p ({< >; (Oi) = Oi Mi}) 

where p[o\,..., o n \0\,..., O n ) is the probability of obtaining outcome Oi in a mea¬ 
surement of Oi in state p. 

2. D is a I-quasiBoolean algebra, where I = {P e d;Pp = 0}. 

This means that when D is a 7-quasiBoolean algebra it is possible to attribute 
definite values to its elements in such a way that we recover the quantum predictions 
when averaging over all possible valuations. Moreover, this attribution is made in such 
a way that all functional relations among the observables are preserved [ ZC98 . 

This shows that there is another way around von Neumann’s result. Instead of 
questioning, like Bell did, the requirement of linearity of the definite values attributed to 
the measurements, we drop the assumption that all observable must receive a definite 
value. Then the theorem above shows that we can actually strengthen the assumption 
of linearity, requiring that all functional relations be preserved, and we still can recover 
the quantum statistics. 

We may ask now what if this result has any physical interest. Clearly it can not be 
used to rule out hidden-variable theories, since this requires that all measurements have 

3 lf F n (x) —► F{x) for all x in we say that the sequence of operators F„ converges strongly to F. 
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A.2. Gleason’s lemma 


definite values. Nonetheless, this result is connected to a family of realist interpretations 
of quantum theory. Each of them supplies a rule of definite-value ascription, which picks 
out, from the set of all observables of a quantum system, the subset of definite-valued 
observables. This family is known as modal interpretations of quantum theory [Stab . 


A.2 Gleason’s lemma 

In reference [ Gle57 j. Gleason proves his famous theorem, a mathematical result which is 
of particular importance for the field of quantum logic. It proves that the quantum rule for 
calculating the probability of obtaining specific results of a given measurement follows 
naturally from the structure of events in a real or complex Hilbert space. Although 
Gleason’s main result is motivated by a problem in foundations of quantum theory, 
his objective had in principle nothing to do with hidden variables, which are not even 
mentioned in his paper. Nevertheless, his work was of huge importance to discard the 
possibility of certain hidden-variable models and its free of certain drawbacks present in 
von Neumann’s assumptions. 

Gleason’s main interest was to determine all measures on the set of subspaces of a 
Hilbert space. 

Definition 65. A measure in the set § of subspaces of a Hilbert space J€ is a function 

[0,1] (A. 4) 

such that p = 1 and such that if {Si,...,S„} is a collection of mutually orthogonal 
subspaces spanning the subspace S 


/x(S) = X>(Sj). (A. 5) 

f=i 

Example 25. To every density operator acting on corresponds a measure p p in § 
defined by 

Pp (S) = Tr (pPs) (A.6) 

where Ps is the projector onto S. 

Gleason’s main result states that all measures on § are of the form 
dimension of J6 is at least three. 

Definition 66. A frame function of weight W for a Hilbert space J6 is a real-valued 
function 

f: 8 —* U (A. 7) 

where <S is the unit sphere in such that if x\,...,x n is a an orthonormal basis for J€ 
then 

£/(*,-) = w. 


A.6 if the 
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Given a non-negative frame function with weight W - 1, we can define a measure on 
For every one-dimensional subspace S of JE, we define p{P) = fix), where P is the 
projector over S and |x) is a unit vector belonging to S. 

Definition 67. A frame function is said to be regular if there exists a hermitian operator 
T acting on J€ such that 

fix) = (x\T\x) 

for all x e 8. 

Before stating his main theorem, Gleason proves several intermediate lemmas, among 
which is the following: 

Lemma 6. Every non-negative frame function on either a real or complex Hilbert space 
of dimension at least three is regular. 

As a consequence of this lemma, we have Gleason’s main result: 

Theorem 40. Let p be a measure on the set £ of subspaces of a Hilbert space of 
dimension at least three. Then there exists a density matrix p such that p = p p . 

The consequences of Gleason’s theorem to the foundations of quantum theory appear 
clearly if one notice that we can interpret the measure defined not on the set of subspaces, 
but on the set of corresponding orthogonal projectors. Every projector acting on J6 
corresponds to an outcome of a measurement in the corresponding quantum system, and 
hence a measure on £ defines a way of calculating the probabilities of these outcomes. 
What theorem [40] states is that the only way of defining these probabilities consistently 
is through the quantum rule using density matrices. 

This is certainly a really interesting fact, but for us the most important statement in 
Gleason’s paper is lemma [6] This result implies that all measures on § are continuous, 
and this discards the possibility of certain hidden-variable models. 

A.2.1 Using Gleason’s Lemma to discard hidden-variable 
models 

Let A define a dispersion-free state in a hidden-variable model compatible with quantum 
theory describing a system whose associated Hilbert space has dimension at least three. 
Then, every one-dimensional projector P has a well defined outcome for A and hence we 
can define a measure 

Px : @ —*■ {0,1} (A. 8) 

that takes each vector in 8 to the value associated to the projector in this direction by 
A. As a consequence of lemma [6] this measure is continuous and hence it has to be a 
constant function. 

To see that this is really the case, we can translate the problem of assigning values 
to the points of the sphere to a problem of coloring the sphere: if the value associated 
to an one-dimensional projector is 1, we paint the corresponding unit vectors in red; if 
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the associated value is 0, we paint the vectors in green. Suppose now that there are 
two vectors with different colors. Then, if we choose a path between the corresponding 
points in the sphere, we have to change abruptly from red to green somewhere in the 
way from one point to the other. Hence, the association can not be done continuously 
if we use both colors. 

Since all associations are constant and we know that, given a pure quantum state, 
there is at least one unidimensional projector with definite outcome 1. We conclude 
that for all states in the hidden-variable model and for all one-dimensional projectors the 
associated definite value is 1. This clearly can not reproduce the statistics of quantum 
theory. 

At first sight, one may think that the argument above puts an end to the discussion 
on the possibility of hidden-variable models completing quantum theory: it just can not 
be done. Although very compelling, there is one extra assumption on the kind of hidden- 
variable considered that was not explicitly mentioned. This extra assumption seemed 
so natural that one may not even realize it is there. Hence, the reasoning above is not 
enough to discard all kinds of hidden variable models. It proves only that noncontextual 
models are ruled out. 


A.2.2 The “hidden” assumption of noncontextuality 


The implicit assumption made in the preceding argument is such that the hidden-variable 
models considered are not general enough, and hence the argument can not be used to 
rule out completely the possibility of completing quantum theory. It was tacitly assumed 
that the measurement of an observable must yield the same outcome, regardless of what 
other compatible measurements can be made simultaneously. This is the hypothesis of 


noncontextuality discussed in section 2.1 


With these observations, we can conclude as a corollary of Gleason’s lemma the 
following result: 


Theorem 41 (Kochen-Specker). There is no noncontextual hidden-variable model com¬ 
patible with quantum theory. 


Although this result follows from Gleason’s lemma, as we proved above, this fact was 
noticed only after it was proved by other means by Kochen and Specker. The advantage 
of Kochen and Specker proof is that, contrary to Gleason’s lemma, it uses only a finite 
number of projectors. 


A.3 Kochen and Specker’s proof 

Suppose a hidden-variable model completing quantum theory is given. If we fix a quan¬ 
tum state for the system and if we also fix the hidden variable, all observables are assigned 
a definite value. We will denote this value for observable O by v{0). We will deal only 
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with observables whose associated operators are one dimensional projectors, since they 
are enough to get a contradiction and prove the desired result. 

The fact that the hidden-variable models must be compatible with quantum theory, 
the value v(P) assigned to a projector P must be one of its eigenvalues, and hence we 
have 

v(P)£{ 0,1}. (A. 9) 

We also require that the assignment v preserves the algebraic relations among com¬ 
patible operators, and hence, if Pi,...,P n are orthogonal projectors such that Y,iPi = I 
we have 


X>(T*) = 1 (A. 10) 

i 

This means that whenever a set of vectors | (pi) is a basis for v{Pi) = 1 for one, 
and only one i, where Pi = |0 ; -)(0/| is the corresponding projector. 

Although v comes form a hidden-variable model, and hence is defined in the set of 
observables in a quantum system, we will use the fact that we are restricted to the set 
of one dimensional projectors and consider v as function assigning values to either the 
one dimensional projectors acting on or unit vectors in . If P - \<p)(<p\, the value 
of v in both P and \ v) is the same 

v{P) = v[\<p)). 

The idea behind Kochen and Specker’s proof is to find a set of vectors in such a 
way that is impossible to assign definite values to the corresponding projectors obeying 
( |A.9| l and flA.10) . This proves the impossibility of noncontextual hidden-variable models 
completing quantum theory. 


Definition 68. A definite prediction set of vectors (DPS) is a set A= {r\,...,r n } of unit 
vectors in a Hilbert space J6 such that at least for one choice of assignment for some 


r ; - the value of some other rj is determined by ( |A.9| ) and QA.10| . 


Such a set may be represented with a graph, usually called Kochen-Specker diagram. 
The vertices of the graph correspond to the vectors in the set and two vertices are 
connected by an edge if the corresponding vectors are orthogonal. In this representation, 
the problem of assigning values to the projectors can be translated into a problem of 
coloring the vertices of the graph. If a hidden-variable model assigns value 1 to the 
projector we paint the corresponding vertex in red. If the model assigns value 0 we 
paint the vertex in green. Notice that the painting is independent of other compatible 
measurements performed simultaneously, which is the assumption of noncontextuality of 
the model. 

Equation ( |A. 1Q| ) implies a rule for the coloring: in a set of mutually orthogonal vectors, 
at most one can be red; if a set of vectors is a orthogonal basis for one, and only 
one of them is red. 

The DPS used in Kochen Specker proof is composed of three dimensional vectors, 


with associated diagram shown in figure A.l Such a set is called a KS-8 set. 
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B F 



Figure A.1: The set KS-8, a DPS used in the original proof of Kochen-Specker theorem. 


Theorem 42. The set KS-8 is a DPS. 

Proof. If vector A is red, B and C must necessarily be green. If H is red, F and G are 
necessarily green. Since the vectors belong to a three dimensional space, D and E are 
necessarily red, which is a contradiction since D and E can not be red at the same time. 
Hence, 

A = 1 => 77 = 0. 


□ 


A KS-8 can be constructed using the following vectors in three-dimensional space: 


A = ( 1 0 0 ) £=(0 cos(/3) sin(/3) ) 

£ = ( 0 cos(a) sin(a) ) F = ( cot(0) 1 -cot(/3) ) 

C = ( cot(0) 1 cot(a) ) G=( tan(0)cosec(/3) -sin(/3) cos(/3) ) 

D = [ tan(0) cosec (a) -sin(a) cos(a) ) 77 = ( sin(0) -cos(0) 0 ). 


Adding two more vectors we get another DPS, called KS-10, whose diagram is shown 
In a KS-10, if A is red, / must necessarily be red. In fact, v{A) = 1 => 


in figure A.2 


v{I) = 0 and v[H) = 0. Since every time we have three mutually orthogonal vectors one 
of them must be assigned the value 1, we have v{J) = 1. This set is obtained if we use 
the vectors in KS-8 plus 7= ( 0 0 1 ) and /= ( cos(0) sin(0) 0 ). 

Definition 69. A set of vectors A = {r\,...,r n } is called a partially no-colorable set 
(PNS) if there is at least one choice of assignment to some r, that makes the assignment 
of values to the other vectors according to rules ( |A.9| ) and ( |A. 10| ) impossible. 


109 













A. The impossibility of non-contextual hidden variable models 


i 



Figure A.2: The set KS-10, a DPS used in the original proof of the Kochen-Specker the¬ 
orem. 


To get a PNS we concatenate five diagrams like KS-10, which results in a set of 
vectors with Kochen-Specker diagram as in figure A.3 called KS-42. For such a set, the 
assignment of value 1 to A is impossible. In fact, 


v{A) = 1 v[Ai) = 1 v[A 2 ) = 1 v[A 3 ) = 1 => p(A 4 ) = 1 => v{J) - 1 , 

but A and / are orthogonal and hence can not be both red. 


Definition 70. A set of vectors is called a totally non-colorable set (TNS) if it is 
impossible to assign definite values to all vectors according to rules flA.9| l and flA.lO l. 


A TNS provides a proof of the Kochen-Specker theorem [4TJ In fact, a hidden- 
variable model compatible with quantum theory must assign values to all projectors 
(or equivalently, to the corresponding unit vectors) in such a way that equations l |A.9| 
and dA. 10| ) must be obeyed. Hence, if we find a TNS we prove that noncontextual 
hidden-variable models compatible with quantum theory are impossible. 

The sphere in any Hilbert space with dimension at least three is a TNS, as we have 
proven as a corollary of Gleason’s lemma. Using three KS-42 sets we can build a TNS 
with a finite number of vectors in dimension three, simplifying the proof of theorem |4l| 
This set is shown in figure A.4| 

A set of vectors with Kochen-Specker diagram as in figure A.4| is called KS-117. This 
is the set used by Kochen and Specker in their proof of theorem |4l| 
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A 



Figure A.3: The set KS-42, a PNS used in the original proof of the Kochen-Specker the¬ 
orem. 


Theorem 43. It is impossible to assign definite values to the vectors of a KS-117 set 
according to equations l |A.9| l and A.10\ 


Proof. The proof is quite simple. We just have to notice that the vectors I, J and K 
can not be assigned the value 1, since they are the first vector of a KS-42 set. But 
they are mutually orthogonal, and hence one of them should be 1 according to equation 

UlOj . □ 


The hard part of the proof is to show that there is a set of vectors in a Hilbert 
space of dimension three with this Kochen-Specker diagram. The details can be found 
in references [ KS67 Cab96 . 
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A. The impossibility of non-contextual hidden variable models 



Figure A.4: The set KS-117, a TNS used in the original proof of the Kochen-Specker 
theorem. 

A.4 Other additive proofs of the Kochen-Specker 
theorem 

A.4.1 P-33 

One of the simplest proofs of the Kochen-Specker theorem uses a TNS with 33 vectors 
in a Hilbert space of dimension three |Per91| . This TNS is known as P-33. 

To simplify the notation, let m = -1 and 5 = s/l. The vectors in P-33 are 

(1,0,0), (0,1,1), (0,1,5), (5,1,1), 

(0, m, 1), (0, m, 5 ), ( 5 , m, 1), (5,m,m), 

and all others obtained from these by relevant permutations of the coordinates. By 
relevant we mean any permutation that generates a vector in a different one dimensional 
subspace, since what is important for the proof is the projector on the one dimensional 
subspace and not the vector itself. 

The set above has an important property: it is invariant under permutations of the 
axis and by a change of orientation in each axis. This allows us to assign value 1 to 


112 










A.4. Other additive proofs of the Kochen-Specker theorem 


some vectors arbitrarily, since a different choice is equivalent to this one by an operation 
that leaves P-33 invariant. 

The table below shows the proof that P-33 is a TNS. To simplify the notation even 
further, we drop the parenthesis in the notation of a vector and use just abc to represent 
the vector ( a,b,c ). In the table, the vectors in each line are mutually orthogonal. The 
vectors in the first column are assigned the value 1, and hence the other vectors in the 
same line are assigned the value 0. The assignment of 1 to the vector in the first column 
is explained in the last column. 


Trio 



Vectors X to the 1 

001 

100 

010 

110 

lm( 

101 

mOl 

010 



Oil 

0ml 

100 



1ms 

mis 

110 

sOm 

0s 

10s 

sOm 

010 

smm 


sll 

01 m 

smm 

mOs 


sOl 

010 

10s 

mms 


11s 

ImO 

11s 

Osm 


01s 

100 

Osm 

lsl 


lsl 

10m 

Osm 

msm 


100 

Osl 

01s 




Explanation 

Arbitrary choice of axis z 

Arbitrary choice of orientation in axis x 

Arbitrary choice of orientation in axis y 

Arbitrary choice between x and y 

2° and 3° are zero 

2° and 3° are zero 

2° and 3° are zero 

2° and 3° are zero 

2° and 3° are zero 

2° and 3° are zero 

CONTRADICTION. 


We get a contradiction in the last line: we have to assign value 1 to 100, but it is already 
assigned value 0 in the first line. 

In the table we used only 25 vectors, but we can not discard the other 8 because we need 
them to repeat the argument with different choices of the first vector in the first four lines. If 
we use only the 25 vectors that appear in the table we would not have a set invariant under 
permutations of the axis and by change of orientation in each axis, and the set of vectors would 
not be a TNS. 


A.4.2 Cabello’s proof with 18 vectors 


In 1996, another simple proof of the KS theorem with 18 vectors in a four dimensional space 
was found by Cabello et. at. [ CEGA96 ], It was the world record at the time. The TNS in this 
proof is shown in figure A.5 Once more, we drop the brackets in the vectors to simplify the 
notation and use m=-1 . 


In the table bellow, the vectors in each column are orthogonal. Cells that contain the same 
vector have the same color. Since we have nine columns, nine different cells, and only nine, can 
be assigned the value 1, one for each column. If the assignment is noncontextual, cells with 
the same color must be assigned the same value. To see the contradiction, we just notice that 
the number of cells with the same color is 2, and hence the number of cells assigned the value 
1 must be even. 
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0001 

0001 

lmlm 

lmlm 

0010 

lmml 

11ml 

11ml 

111m 

0010 

0100 

lmml 

1111 

0100 

1111 

111m 

mill 

mill 

1100 

1010 

1100 

lOmO 

1001 


ImOO 

1010 

1001 

ImOO 

lOmO 

0011 

010m 


OlmO 

0011 

010m 

OlmO 


Figure A.5: The set used in Cabello’s proof of the Kochen-Specker theorem using 18 
vectors. 


A.4.3 The simplest proof of the Kochen-Specker theorem 

Any TNS shown above provides a proof for the Kochen-Specker theorem and the impossibility of 
noncontextual hidden variable models is established. Nevertheless, from a physical point of view, 
there is still a lot of work to be done. The validity of the theorem should be experimentally 
verified, and hence people started to work on experimental implementations of such proofs 
[TKL+13] . 

The need of an experimental verification of this result is what makes the improvement made 
by Kochen and Specker's original proof so important: in Gleason's proof, we need an infinite 
number of vectors to reach a contradiction, and this, of course, makes any experimental test 
of the result impossible. In the original proof of Kochen and Specker the set of vectors used is 
finite, but it is really big. Any experimental arrangement involving 117 measurements is really 
hard to implement with small error. 

Many proofs where derived after Kochen-Specker work, with the objective of simplifying the 
TNS used. Among the additive proofs (those relying on equation ( |A. 10| > ), the proof presented 
in section [A.4.2 is still the world record for smallest number of vectors in the set. But a proof 
with few vectors is not necessarily the simplest proof for an experimentalist. The number of 
different measurement setups is related to the number of contexts, and hence it might be better 
in some situations to seek for a set with the smallest number of contexts. In this sense, the 
simplest proof known was presented in references [LBFCT4] . The 21 vectors used are shown in 


figure A.6 The Kochen-Specker diagram of this set is shown in figure A.7 


In the table of figure [A~6 the vectors in each column are orthogonal. Cells that contain the 
same vector have the same color. Since we have seven columns, seven different cells, and only 
seven, can be assigned the value 1, one for each column. If the assignment is noncontextual, 
cells with the same color must be assigned the same value. To see the contradiction, once 
more we notice that the number of cells with the same color is 2, and hence the number of 
cells assigned the value 1 must be even. 


A.4.4 Multiplicative proofs of the Kochen-Specker theorem 

In the previous proofs of the Kochen-Specker theorem, we have used the sum of compatible 
operators and the fact that the values assigned by a hidden-variable model to the observables 
should obey the same linear relations the corresponding operators did. More generally, we can 
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100000 

100000 

010000 

001000 

000100 

000010 

000001 

010000 


OlOlWQ 2 

0110co 2 q 

01gxo 2 01 

01(o 2 col0 

001000 

OlOloxo 2 

1001co 2 oo 

1001co 2 u 

IOIOgkd 2 

10co 2 co01 

10coq 2 10 

000100 

0110co 2 co 

IOIOgxo 2 

110011 

noon 

coco 2 0101 

co 2 co0110 

000010 

01wco 2 01 

IOgFcoOI 

coq 2 0101 

u 2 m1001 

co 2 col001 

coa> 2 1010 

000001 

01w 2 cal0 

10coco 2 10 

(iFcoOIIO 

coq 2 1010 

111100 

111100 


Figure A.6: The set used in the simplest proof of the Kochen-Specker theorem using 21 
vectors and 7 contexts. 
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Figure A.7: The Kochen-Specker diagram of the TNS. Vector labeled by i j is the vector 
common to z'-th and j-th basis. 


assume that, for compatible operators, the validity of 

f{A l ,...,A n ) = 0 

implies that 

f(v{A 1 ),...,v{A n )) = 0, 

for any function /. 

This allows the construction of proofs of the Kochen-Specker theorem with different func¬ 
tions f. Examples of such proofs are the multiplicative ones we will discuss bellow. In this kind 
of argument, we use the fact that a set of compatible operators obey the relation 

Ai x ... x A n = B 

to impose the condition 

z;(Ai) x ... x v(A n ) = v{B). (A.ll) 
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A. The impossibility of non-contextual hidden variable models 


The Peres Mermin square 


A simple multiplicative proof of the Kochen-Specker theorem uses the set of operators known 
as the Peres-Mermim square |Mer90l IPer90| : 

A\ = cr x ® I A2 = I ® o x A3 = o x ® o x 

A 4 = I®o y A 5 = <7 y ®I A 6 = cryis>cry (A.12) 

A 7 = o x ®o y A 8 = a y ®a x Aq = a z ®cr z . 

It is not possible to assign definite values v(Ai) to all of these observables in such a way 
that the value assigned to each operator is one of its eigenvalues and jA. 11 ) is satisfied. This 
happens because this set of operators has the following properties: 


1. The three operator in each line and in each column are compatible; 

2. The product of the operators in the last column is -I. The product of the operators in 


the other columns and 

in all lines is I. 


Using equation (|A. 11), we 

have 




Pi = 

v(Ai)v{A 2 )v(A 3 ) 

= 1 


P2 = 

v{A 4 )v{A 5 )v(A 6 ) 

= 1 


P3 = 

v(A 7 )v{A 8 )v(Ag) 

= 1 


Pi = 

v(A\)v{A 4 )v{A 7 ) 

= 1 


P 5 = 

v(A 2 )v{A 5 )v(A 8 ) 

= 1 


Pb = 

iTA 3 MA 6 MA 9 ) 

= -l 


(A. 13) 


and hence 

1 - P X P 2 P 3 = P4P5P6 = -1 

which is a contradiction. This proves that the Peres-Mermim square provides a multiplicative 
proof of the Kochen-Specker theorem. The assumption of noncontextuality appears clearly in 
equations (A. 13) since we assumed that each observable has the same value independently if 
it is measured together with the other compatible observables appearing in the same line or in 
the same column. 


A simple proof in dimension 8 

Another simple multiplicative proof of the Kochen-Specker theorem is provided by the set of 
vectors 


A\ = cry <8> / <8> I 

A 3 = CTyl8>0'yl8>0 r X 

A5 = CJ X ®(Jy®(Jy 

A 7 = I®I®Oy 
Ag = I ® <7y <8> I 


A 2 - 

A/{ = 0y®0 X ®(Jy 

Ae = I®I®a x 
A% = 0 x ® I® I 
Ajo = / ® cr x ® I. 


The contradiction we get when we assign definite values to these observables is easily 
understood if we arrange them in a star, as shown in figure AAA The operators are arranged 
in five lines with four operators each : AiA 3 A 6 Ag, A 1 A 4 A 7 A 10 , A 2 A 3 A 4 A 5 , A 2 A 6 A 3 Aio and 
A 5 A 7 A 8 Ag. The following properties hold: 
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A.5. A contextual hidden-variable model 



Figure A.8: Observables providing a proof of Kochen-Specker theorem in dimension 8. 


1. The observables in each line are compatible; 

2. The product of the observables that appear in the horizontal line A 2 A 3 A 4 A 5 is the 
product of the observables in every other line is I. 

This properties implies that the values assigned by a hidden-variable model must obey 


p 1 = 

v(Ai)v{A 3 )v(A 6 )v{A s ) 

= 1 , 

(A. 14a) 

Pz = 

v{Ai)v(A a )v{A 7 )v(A w ) 

= 1 , 

(A. 14b) 

p 3 = 

v{A 2 )v(A e )v{A 8 )v(Aio) 

= 1 , 

(A. 14c) 

p 4 = 

v(A 5 )v{A 7 )v(A 8 )v(A 9 ) 

= 1 , 

(A.14d) 

p 5 = 

v(A 2 )v{A 3 )v(A 4 )v(A 5 ) 

= -l. 

(A.14e) 


This leads to a contradiction, since the validity of the equations above would imply 

-l = P l P 2 P 3 P A P 5 = l\v(A i ) 2 = l. 


A.5 A contextual hidden-variable model 

The Kochen-Specker theorem forbids noncontextual hidden-variable models, but it is possible 
to complete quantum theory in order to give definite values for all projective measurements, as 
long as we drop the assumption of noncontextuality. An example of such a model is provide by 
Bell in reference [Bel66f . 
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A. The impossibility of non-contextual hidden variable models 


To define a hidden-variable model it suffices to define the values v{P) attributed to the pro¬ 
jectors P. This happens because every hermitian operator can be written as a linear combination 
of compatible projectors 

i 

in which Xi is the eigenvalue of A corresponding to eigenvector | (pi). As we can choose the \<p) 
mutually orthogonal, we can assume that [P t p i ,P t p j ] = 0 and hence they are mutually compatible. 
Since the assignment v must preserve the linear relationships between compatible vectors, we 
have 

v{A) = Y j X i v{P (t>i ). 

i 

Suppose an experimental arrangement performs the measurement of the observables repre¬ 
sented by the projectors P<Pi P<Pn ■ Let us define the numbers a,- e R such that the expectation 

values of the P ( p l ,... } P,p n are ai,a 2 ~ a\,a^ - a 2 . a n -a n - 1 , respectively. As hidden variables 

we will use a real number between zero and one. The value associated to projector P if the 
value of the hidden variable is X is 


f v{P (l>i ) = 1 if ai-i < X< a it 
\ viPipi) = 0 otherwise. 

Notice that the value of each a\ depends on the entire set of projectors being measured. 
Hence the value of vl P^.) does not depend just on the quantum state of the system and the 
hidden variable A, it depends also on which other projectors are being measured with P ( p r This 
means that this is a contextual hidden-variable model. 

To show that this model agrees with the quantum predictions, we notice that 

<Pd>i>= f v(Pd> i )dX = a i -a i - 1 . 

Jo 

This model is quite artificial, but it is important conceptually to show that the hypothesis 
of noncontextuality in the Kochen-Specker theorem is essential to discard the possibility of 
hidden-variable models. It shows that the completion of quantum theory is possible, and brings 
hope for those who doubt the fact that nature could be intrinsically probabilistic. But one 
important remark must be made. Hidden-variable theories were first imagined by people who 
believed that the world could not behave in such a counter-intuitive manner. The main point 
was to recover the notion we have in classical theory that every measurement has a definite 
outcome, that exists prior to the measurement and is only revealed when the measurement 
is performed. If we choose to keep this line of thought, the Kochen-Specker theorem forces 
contextuality on our theories, which is also a really intriguing feature, not present in classical 
theories. Hence, if quantum theory is really correct, and so far there is no reason to believe it 
is not, we have to accept the fact that things are a bit weird and our intuition, modeled by our 
experience with classical systems, can not be applied to explain its phenomena. 

There exist also other state-independent proofs with a smaller number of observables. In 
reference ?? the authors present a proof of the Kochen-Specker theorem with 13 vectors. The 
idea of the proof is quite different from the additive and multiplicative proofs we have shown 
above. It is based on the violation of an experimentally testable inequality involving only 13 
observables that is satisfied by all non-contextual models while being violated by all qutrit states. 
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A.6 Final Remarks 

In the classical description of physical systems, probabilities come from our lack of knowledge 
about the past history of the system, or due to practical problems that come when we deal with a 
huge number of particles at the same time. Every system has well defined values for all physical 
quantities, that are merely revealed by the measurements. The impossibility of accessing these 
values was believed to be a technological and practical issue and not a fundamental limit 
imposed by nature on the information we can gain when interacting with a system. 

This reasoning can not be applied to quantum theory. Since the development of its modern 
mathematical formulation in the 1920's, this intrinsic probabilistic behavior has been puzzling 
physicists and philosophers of science, experts and non-experts all around the world. Is it a 
flaw on the mathematical structure of the theory? Would it be possible to complete quantum 
theory in order to predict with certainty the outcomes of each measurement and still recover 
the quantum statistics? 

In this chapter we have shown that if this completion is required to be noncontextual, it 
is not possible. The first attempt was made by von Neumann in 1932. He showed that under 
some assumptions, the expectation-value functions in the hidden-variable models should obey 
the quantum rule, and hence could not be dispersion-free. His argument, though, discards 
only a very restrict class of hidden-variable models, since he made the strong assumption that 
expectation-value functions should reproduce the algebraic relations among the observables, 
even if the observables are not compatible. Although this is the case for quantum theory, we 
can not justify this assumption physically and hence we should not impose it on our models. In 
fact, a simple hidden-variable model for a qubit system is provided by Bell as a counter example 
to von Neumann's result. 

More successful results appeared with the work of Kochen and Specker. Their main theo¬ 
rem states that for system with dimension 3 or higher, noncontextual hidden-variable models 
recovering the quantum statistics are not possible. The noncontextuality assumption requires 
that the value assigned to a measurement does not depend on other compatible measurements 
performed together. The same result can be proven with the help of Gleason's lemma, with 
the drawback that the number of vectors in the proof is infinite. After Kochen and Specker's 
original proof, many others have been derived. The advantage of these proofs is that they 
are much simpler then the first and hence may be more suitable for experimental implementa¬ 
tions. We have discussed some of these proofs above, but many more are known. We refer to 
[ Cab96 lTKL + 13j for a more details. 

As shown by Bell, it is possible to construct a contextual hidden-variable model for any set 
of measurements in any dimension. Although this model is quite artificial, it proves that the 
assumption of noncontextuality is crucial in the Kochen-Specker theorem. 

In summary, what we learn with this result is that to reconcile the quantum formalism with 
the notion of well defined physical properties of classical intuition, we must accept contextuality, 
which is also a very counter-intuitive property. How could the value of one physical quantity 
depend on what other properties are jointly measured? The Kochen-Specker theorem implies 
that there is no way out: the mathematical description of quantum systems does not agree 
with the classical idea of pre-defined physical quantities. 
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«/» Appendix B 

Non-locality 


Historically, the discussion of nonlocality in quantum theory preceded the discussion about its 
noncontextual character. It started around 1935, when Einsten, Podolsky and Rosen noticed 
that the way of thinking of classical physics does not apply directly to quantum systems I EPR 35 . 
They started one of the greatest debates in foundations of physics and philosophy of science in 
general, that is still fruitful nowadays. 

The classical world consists of objects with precise physical attributes: position, mass, 
velocity, orientation, charge, etc. This is how physicists were used to think for centuries. Their 
job was to understand the connection between these attributes and create mathematical objects 
that mimic these relations. A theory build for this purpose would be considered satisfactory if 
every relevant physical attribute has a counterpart in the theory and if the relations and results 
predicted by this correspondence agree with what is observed in real situations. 

This line of thought led many scientist, including Einsten, Podolsky and Rosen, to con¬ 
jecture the existence of a more complete theory behind the quantum formalism. The intrinsic 
probabilistic character of quantum measurements should be the result of the lack of knowledge 
about the past history of the system and a more adequate theory should be conceived that 
predicted all these results with certainty. 

This is the same reasoning that we used to conjecture the existence of hidden-variable 
models completing quantum theory. Einsten, Podolsky and Rosen belied that such a model 
would be possible. In this chapter we prove that under the assumption of locality, this kind of 
model does not exist. 

The first one to provide a proof of the impossibility of local hidden-variable models was 
John Bell, in 1964 |Bel64] , He demonstrated that if the statistics of joint measurements on a 
pair of two qubits in the singlet state were given by a hidden-variable model, a linear inequality 
involving the corresponding probabilities should be satisfied. A simple choice of measurements 
leads to a violation of this inequality, and hence the model can not reproduce the quantum 
statistics. 

Many similar inequalities were derived since Bell's work. Because of his pioneer paper, 
any inequality derived under the assumption of a local hidden-variable model is called Bell 
inequality. Quantum theory violates these inequalities in many situations. Besides the insight 
given in foundations of quantum theory, those violations are also connected to many interesting 
applications. 

The pioneer paper of Einsten, Podolsky and Rosen is discussed in section |B.1| Hidden- 
variable models are introduced in section |B.2| and Bell's proof of the impossibility of such 


models in section B.3 Other proofs based on Bell inequalities are presented in section B.4 and 
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its connection with convex geometry in B.5 We finish with out final remarks in section B.6 


B.l The EPR paradox 

Einsten, Podolsky and Rosen published in 1935 one of the most important and cited papers 
in quantum information theory and also in foundations of quantum mechanics. In their letter, 
entitled “Can Quantum-Mechanical description of Physical Reality Be Considered Complete ?" 
[ EPR35 J. the authors argue that in order to a physical theory be considered complete, every 
quantity with physical reality has to be predicted with certainty by the theory. As we know, 
non-commuting observables in quantum theory can never have definite values simultaneously, 
and hence, we must accept one of two possible situations: either quantum theory does not 
provide a complete description of nature or two non-commuting observables can not both have 
physical reality. They present arguments discarding the second option, and hence they believed 
that quantum theory could not be considered complete. 

According to EPR, when we analyze the success of a theory, we must ask two questions: 

1. Is the theory correct? 

2. Is the theory complete? 

The answer to question number[l]is 'yes' if the predictions of the theory agree with all data 
available from experimentation in real physical systems. Of course, it is always possible that 
a theory considered correct be at some point contradicted with more modern and advanced 
experimental setups, and if this happens physicist should seek for different theories capable of 
describing the new results. At that time, as it is nowadays, the answer of this question for 
quantum theory is 'yes'. 

The concept of a complete theory is more delicate and it is not easy to define. EPR argued 
that any reasonable definition for completeness must end in a concept for which the following 
condition is necessary: 

“Every element of the physical reality must have a counterpart in the physical theory." 

The concept of physical reality is also delicate, but they provide a condition they consider 
to be sufficient for a physical quantity to be called an element of reality: 

“If, without any way disturbing a system, we can predict with certainty the value of a physical 

quantity, then there exists an element of physical reality corresponding to this physical 

quantity." 

For them, a physical theory can only be considered satisfactory if it is both correct and 
complete. 

In classical theory, once we have full information about the system, that is, if we have a 
pure state, all measurements have definite values. Therefore, every quantity corresponds to an 
element of reality and classical theory is complete. In the other hand, quantum theory does 
not predict the outcomes of every measurement even if the system is in a pure state. This 
can only be done if the state is an eigenvector of the corresponding operator and hence two 
non-commuting operators can not have both definite values in every state. 
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B.l. The EPR paradox 


Consider, for example, the quantum system of one qubit. If a qubit is in state |i (/) = 10), we 
can predict that a measurement of the observable o z will have certainly outcome 0. If instead 
we measure o x , each possible outcome occurs with equal probability. 

These observation and EPR assumptions lead to the conclusion that one of two conditions 
must hold: 

1. Quantum theory is not complete; 

2. Two non-commuting observables can not represent elements of reality at the same time. 


In fact, if quantum theory was complete and both observables corresponded to elements 
of reality, both should have definite values predicted by the theory for all pure states, which is 
certainly not possible. 

Lets see now how EPR discard option [2] Suppose we have a pair of quantum system that 
have interacted in the past in composite state |'f'>. Suppose we want to measure to observables 
M and N in the first system and let {\u\),... ,\u m )} and {\v\) ,...,\v n )} be the eigenvectors of 
M and N, respectively. Then we can decompose |'P> in two different ways: 

m 

IT) = £|Ki>®|/ii> 

i=1 

m = £>,->« i v,-> (B.i) 

i=1 


where | jij) and |v,-> are pure states for the second system. Suppose now that measurement M 
was performed in the first system. The state of the composite system after the measurement, 
if outcome i was obtained is and the second system can be described by the state 

| m). On the other hand, if measurement N was performed in the first system, the state of 
the composite system after the measurement, if outcome i was obtained, is | vi) <8> |v;>, and the 
second system is left in state |v,->. 

Now EPR argument that since nothing was done in the second system, the physical reality 
of this system is the same for both options, and hence |/i,-) and |v,) describe the same physical 
reality. 

Suppose now that the vectors |) are eigenvectors of an observable M' in the second 
system and the vectors |v ; ) are eigenvectors of an observable N' in the second system, not 


commuting with M'. This can be the case in some situations, as we show in example 26 below. 


If we measure M in the first system, we can predict with certainty the outcome of M' in the 
second system, without disturbing the second system, since we have not interacted with it at 
any point during the measurement. On the other hand, if we measure N in the first system, we 
can predict with certainty the outcome of N' in the second system, again without disturbing 
it. Hence, both M' and N' must correspond to elements of reality, which in turn proves that 
condition [2] is not true. Thus, we have no option but to accept the fact that quantum theory 
is not complete. 


Example 26 (The Singlet). The state of two qbits given by 

I01>-|10> 


|T_> = 


V2 


(B.2) 
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called the singlet, can be used to exemplify the situation mentioned above. This state can be 
also written as 


I'P-) 


l+->-|-+> 

V2 


(B.3) 


If we use equation ( |B.2t , we see that a measurement of a z in the first qubit allows the prediction 
of the result of the same measurement in the second qubit. In the other hand, if we use equation 
we see that a measurement of o x in the first qubit allows the prediction of the result of 
the same measurement in the second qubit. 


EPR’s discussion on physical reality is based on Newtonian (classical) mechanics, which is 
suitable only to describe the motion of macroscopic objects. The study of the motion of bodies 
is an ancient one, making classical mechanics one of the oldest and largest subjects in science. 
It is also the physical theory that describes most of the phenomena we deal with in our daily 
life and hence it is not surprising that our intuition is guided by this way of thinking. EPR go 
even further, using this ideas as impositions of what we should call physical reality. This line 
of thought is not necessarily valid for quantum systems, as we already discussed in chapter [2] 
and appendix[A| 

The debate in EPR’s paper is of great importance both from the physical as well as the 
philosophical point of view. This issue deserves a much more deep analysis then the one 
presented here and many people have devoted their time to investigate it. See [ Staa j and 
references therein for more detailed discussion on the subject. 


B.2 Local Hidden-Variable Models 


If quantum theory is not complete, we should seek for other theories that assign definite out¬ 
comes for all measurements and at the same time, agree with all quantum predictions. We 
continue with the same nomenclature used in chapter[2]and call such theories hidden-variables 
models compatible with quantum theory. EPR believed in the existence of such theories. We 
have already proved that under the assumption of noncontextuality, these theories can not exist. 


In section B.3 we prove that under the assumption of locality, these models also do not exist. 

The hypothesis of locality is crucial in EPR’s argument. It states that physical processes 
occurring at one place should have no immediate effect on the other location. This appears to 
be a reasonable assumption to make, as it is a consequence of special relativity, which states 
that information can never be transmitted faster than the speed of light. This assumption is 
explicit in their argument, since they assume that the measurement performed on the second 
particle does not influence the first one. EPR’s assumption is generally referred to as local 
realism, as it is the combination of the principle of locality with the realistic assumption that 
all systems must objectively have a pre-existing value for any possible measurement before the 
measurement is made. 

The assumption of local realism has an immediate consequence on the probability distribu¬ 
tion describing the measurements performed in a composite system. If we assume this condition, 
a complete description of the system has to give predefinite values for all measurements in all 
subsystems and at the same time the value obtained in one subsystem can not depend on the 
measurement performed on any other subsystem. 

Within this perspective, any uncertainty on the outcomes of each measurement comes 
from the fact that the previous history of the composite system is not known. With the locality 
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theory 

assumption, any correlation among the results of the measurements is a consequence of the 
past interaction among the parties. Let A be a set of variables describing the past history of 
the composite system. They play the role of hidden variables in a hidden-variable model. Once 
these variables are known, there is no correlation between the outcomes in each subsystem, as 
a consequence, the statistics of the experiment can be written as 

p{ai,...,a n \Ai,...,A n ) = Y J PWp(ai\A 1 ,X) x ■■■ p{a n \A n ,A), (B.4) 

A 

where p(_a\,...,a n \A\,...,A n ) is the probability of getting the set of outcomes a\,...,a n when 
measurement Ai is performed on part i, p{ai\Aj,A) is the probability of getting a* in measure¬ 
ment Aj in the 2 -th subsystem given the past history A, and p{ A) is the probability distribution 
on the hidden variable A. 

Equation |B.4| l provides a mathematical way of verifying if the statistics of a given experiment 
is consistent with the assumption of local realism. If this is the case, it should be possible to 
write the probability distribution in the form given by this equation. In the next section we 
prove that this is not always possible if the statistics is obtained from quantum systems. 


B.3 Bell’s proof of the impossibility of hidden 
variables compatible with quantum theory 

Suppose we have a pair of qubits in the singlet state. Any measurement on one qubit with 
possible outcomes +1 can be written on the form 


R = f -a = na x + r 2 (J y + r 3 a z , 


where r = (n, r 2 , r 2 ) is a unit real vector. 

Let us suppose also that a given hidden-variable model provides definite values for the 
measurements performed in each qubit. If this model satisfies the locality assumption, the 
value of such a measurement performed on one of the qubits depends only on the vector f and 
on the hidden variable A. We will denote this value by Vi(r, A), where i = 1,2 denotes the qubit 
on which the measurement is performed. 

Since the qubits are in the singlet state, the results are anti-correlated if the same measure¬ 
ment is made in both qubits. Hence, 


22i(r,A) = -v 2 (r, A). 


Also the quantum expectation value for the measurement of R = f -a in the first qubit and 
S=s-L t in the second qubit is equal to 


{RS)q = -r-s 

and it must agree with the expectation value calculated using the hidden-variable model, which 
is 

( RS > = ^p(A)22i(?,A)i2 2 (?,A) = - ^p(A)22i(r,A)22i(?,A). 

A A 
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It follows that for any other measurement T = t-a we have 

(RS)-(RT) = -Y J PWlv l {r,X)v l Cs,M-v l {r,X)v l Ct,X)} (B.5) 

A 

= ^ piX) vy (r , X) vyis,X)[vyis, X)vy(t, A) - 1] (B. 6 ) 

A 

and hence 

\(RS)-(RT)\<1 + (ST). 

If the hidden-variable model agrees with the quantum prediction, we have that 

|r • s- r-t\ < 1 + ?• t (B.7) 

an inequality that must hold for every choice of r,s and t. 

Now, if we choose r = s = — t the left hand side of the inequality is equal to 2 , while the 
right hand side is equal to 0 , which is a contradiction with inequality |B.7) . This proves that the 
conclusions obtained with the assumption of local realism do not agree with quantum theory. 

Theorem 44. There is no local hidden-variable model compatible with quantum theory. 


BA Bell Inequalities 

There are many other linear inequalities which can be obtained assuming the hypothesis of 
local realism that are violated in some experimental situations involving quantum systems. All 
of these inequalities are called Bell inequalities, named after Bell's pioneer discovery, inequality 
Hz}- It is possible to find a huge number of non-equivalent Bell inequalities in the literature 
and work has been devoted to create a database to collect and organize all these examples ??. 

B.4.1 The CHSH inequality 

The most famous and also the simplest Bell inequality was derived by Clauser, Horne, Shimony 
and Holt [ CHSH69 J. This inequality is known as CHSH inequality. 

In the corresponding experimental scenario, there are four measurements available in a 
bipartite system, two measurements in each subsystem. Each measurement has two possible 
outcomes, which we denote by ± 1 . 

Let us denote the measurements in the first subsystem by Ay and A 2 and the measure¬ 
ments in the second subsystem by By, £2- Given a choice of measurement in each subsystem, 
p(a,b\Ai,Bj ) will denote the joint probability of having outcome a in the first subsystem and 
b in the second subsystem. The expectation value of the joint measurement of At and Bj is 

(AiBj) - p{ll\AiBj) + p{-l - 1 | AiBj) - p{-n\AiBj) - pi 1 - 1 | AiBj). 

Consider now that the outcomes of A, and Bj are given by a local hidden-variable model. 
Then we have 

pia,b\A it Bj) = Y J PWpia\A i ,X)pib\Bj,X). 

A 

All probability vectors of this form can be written as convex combination of the ones assigning 
definite values to each measurement locally. We will focus first in those distributions. The 
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definite values assigned to each measurement by the model will be denoted by v[Aj) and v{Bj). 
In this case we have 

(A i B j )=v(A i )v[B j ). (B.8) 

Now consider the sum 


Schsh - (AiBi) + ( A1B2 ) + (A2B1) - (A 2 B 2 ). (B.9) 


If these values are given by equation jR8j, we have 


Schsh = v{Ai)v(B 1 ) + v{Ai)v{B 2 ) + v(A 2 )v(.Bi)- v[A 2 )v(B 2 ) 

= vlAMvlBti + vlBzn-vtAzKvtBii-viBz)). (B.10) 

Since the possible outcomes are +1 it follows that Schsh is either 2 or -2. Taking convex 
combinations of these distributions we conclude that if some distribution is given by a local 
hidden variable model we have 


-2 < (A^) + (A^) + (AzBi) - ( A 2 B 2 > < 2. (B.ll) 


The second inequality is the famous CHSH inequality. 

Now we see what can happen if we use a quantum system. 

Example 27. Consider again the singlet state |'f / _) and the measurements A\ = cr z , Az = cr x , 
Bi = ~ CJx ~ (Jz and Bz = ~ CTj 2 +t7z . In this case we have Schsh = 2y/2, which violates the local 
bound of 2 given by the CHSH inequality l |B.ll| . This is the maximum value obtained with 
quantum distributions, and this bound is called the Tsirelson bound for the CHSH inequality 
[Gr85j . 


B.5 Bell inequalities and convex geometry 


We can define more precisely the scenario we are working with, in a similar way as was done 
Once more we start with a set of possible measurements X, and the main 


2.3 


in section 

difference from what was done before is that now we assume that the system is composed 
of n different spatial separated subsystems. The set X is then divided into various distinct 
subsets X\,Xz,...,X n , where X; is the set of measurements available for party i. In this case, 
compatibility is guaranteed by the spatial separation among the parties, and all contexts are of 
the form 


C={M l ,M 2 ,...,M n }, 


MiEXi. 


Scenarios with these extra restrictions are called Bell scenarios. The particular case in which all 
parties have each one m measurements available, each measurement with o possible outcomes, 
is denoted by ( n,m,o). 

The vertices of the compatibility hypergraph of a Bell scenario can be split in the n disjoint 
subsets Xj. Each edge has one, and only one element of each X,-. In the bipartite case n = 2, 
this graph is the complete bipartite graph G = (Xi,Xz). 
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B. Non-locality 


The probability distributions for Bell scenarios can be denoted in a simple way. Given a 
context C = {Mi,M 2 ,...,M n }, 




will denote the probability of the set of outcomes mi,m 2 ,...,m n when each measurement M* 
is performed in party i. 

The no-disturbance property in this case is a very reasonable restriction to make. It is 
a consequence of the assumption that the measurements performed in one site do not affect 
any other instantaneously, since no information can travel faster then the speed of light. In 
this context, this property is referred to as the no-signaling condition. The set of no-signaling 
distributions yT is a polytope, since it is defined by a finite set of linear inequalities. 

The noncontextual distributions of a Bell scenario are exactly the ones for which a local 
hidden-variable model can be constructed. 

Definition 71. A probability distribution p for a Bell scenario is called local if it can be written 
in the form 

n 

p[mi,m 2 ,...,m n \M 1 ,M 2 ,...,M n ) = £p(A) Y[ p(mi\Mi,\) 

A i=1 

where p{X) is a probability distribution in the hidden variable X. 

Since the set of local distributions 5£ is the convex hull of a finite set, it is a polytope. The 
H-descriptions of this polytope correspond to a finite set of Bell inequalities providing necessary 
and sufficient conditions for membership in 5£. 

Definition 72. A Bell inequality is a linear inequality 


5 = X Tm lt m 2 m n \MuM 2 M„pim i , m 2 ,m n \M\,M 2 ,M n ) < b 


where Ym 1 ,m 2 ,...,m„\M 1 ,M 2 ,-.Mn ar| d ^ are rea ^ numbers, which is satisfied by all classical distribu¬ 
tions and violated by some nonlocal distribution. A tight Bell inequality is a linear inequality 
defining a non-trivial facet of the local polytope X£. 


In general quantum distributions do not satisfy all Bell inequalities, as we saw in example 
27 This behavior is often referred to as quantum nonlocality. The maximal quantum value for 


S is called the Tsirelson bound for the inequality [ Cir80| . 


B.6 Final Remarks 

In this chapter we have shown once more that under very reasonable circumstances, a completion 
of quantum theory by a hidden-variable model is not possible. The impossibility proofs are based 
on multipartite scenarios and rely on the fact that, according to special relativity, information 
can not travel faster then light. This restriction imposes the condition that what is done in 
one party can not instantaneously affect any other, and hence that our hidden-variable models 
have to be local. 

The first impossibility proof in this situation was provided by John Bell jBe!64] , who derived 
an inequality for the expectation values of joint measurements in a pair of qubits in the singlet 
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state that should be valid if those were given by a hidden-variable model. This inequality is not 
always valid for quantum distributions, what proves that these models can not reproduce the 
statics of quantum theory for this state. 

After Bell's work many other inequalities satisfied by local hidden-variable models and 
violated by some quantum distributions were derived. The simplest and also most famous is 
the CHSH inequality [ CHSH69 ]. Violations of Bell inequalities prove that the assumption of 
local realism is incompatible with quantum theory. Locality and realism are features of classical 
theory, properties of our daily life experience, that can not be applied at the same time in the 
description of quantum systems. There is huge amount of work on the subject, both in the 
aim of finding new inequalities and finding applications for different types of inequalities (see 
(BCP + 13] and references therein). 

There are also many experimental implementations leading to violation of a Bell inequality 
fWikaj . The first one was performed in 1972 by Stuart J. Freedman and John F. Clauser 
|FC72| . Modern experiments are very precise, but unfortunately none of them is able to fulfill all 
requirements necessary to actually eliminate the possibility of hidden-variable models describing 
the system involved according to our classical conceptions. The failures in these experiments 
are generally called loopholes fWike] . 

The most common of these failures are the detection loophole and the locality loophole. 
The detection loophole comes from the fact that all detectors (or measurement devices) are 
imperfect: a portion of the systems prepared are always lost before they are detected. Hence, 
the data obtained in the experiment is incomplete. It is possible that this missing data creates 
the illusion of a violation of the inequality, while if we take into account the lost events in the 
statistics we would have a local distribution. 

The locality loophole appears because in some implementations is not possible to guarantee 
that the subsystems are sufficiently far apart from each other. We need to make sure that what 
happens in one laboratory does not affect the results in the other. To do that we have to 
assure that the process of choosing a measurement, performing it and getting an outcome 
is completed before any signal can travel from one site to the other. The first time it was 
done was in 1981, when Alain Aspect and collaborators performed the pioneer experiment of 
violation of the CHSH inequality [ ADR82 . This experiment does not eliminate the detection 
loophole. Since that time, many improvements were made. The photon is the first experimental 
system for which all main experimental loopholes have been surmounted, albeit presently only 
in separate experiments |GMR’ 1311CMA + 13j . We believe that a loophole free implementation 
will soon be achieved. 
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<•/» Appendix C ov> 

What explains the Tsirelson bound? 


Quantum probability distributions may exhibit nonlocality, a feature that is revealed by the 
violation of a Bell inequality. In most cases it is possible to find distributions that violate this 
inequalities more then the quantum distributions. What is the physical explanation for that? 
Why isn't quantum theory more nonlocal then it is? For a given scenario, what distinguishes the 
set of quantum probability distribution from others obtained with general probability theories? 
In this chapter we discuss the various physical principles proposed to answer this question. 

In section C.l| we show that the no-signaling principle, implied by the relativistic imposition 
that no signal can travel faster then the speed of light, is not enough to rule out violations 
higher then the Tsirelson bound. Nonetheless, the existence of some of these distributions has 


implausible consequences for communication complexity, which we examine in section C.2 


The principle of Information Causality, which states that the information gain that one can 
get about the data of a spatially distant observer by using all his local resources and m classical 
bits sent to him by this observer is at most m bits. It is a generalization of the no-signaling 
principle, which is just Information Causality with m = 0. This principle is satisfied by quantum 
distributions, but discards many others outside the quantum set, as we will see in section [C~3| 
The principle of Macroscopic Locality, subject of section C.4 states that a any physical 
theory should recover the classical results when we measure a large number of systems and our 
devices are not capable of identifying individual particles. It is not equivalent to Information 
Causality and it is also known that it can not recover the quantum set. Nonetheless, it is a 
reasonable property we should expect from any alternative to quantum theory. 

In section C.5 we show that no bipartite principle is capable of ruling out some non-quantum 
distributions. This proves that intrinsically multipartite principles must be found. The first one 
is the principle of Local Orthogonality, the Exclusivity principle applied to Bell scenarios. It 
can be used to rule out many non-quantum distribution, including some of the distributions 
that can not be ruled out by any bipartite principle. This principle and some implications are 
C.6 We finish this appendix in section [CT] with our final remarks. 


discussed in section 


C.l No-signaling 

We have seen in appendix [B] that relativistic causality is a reasonable imposition to make on 
the acceptable probability distributions in a Bell scenario. This restriction is a consequence of 
special relativity theory, which states that no signal can travel faster then the speed of light. 
Quantum theory does not violate this principle, but more general probabilistic theories might. In 
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1993, Popescu and Rorlich proposed to take non-locality as the quantum principle and analyze 
what this assumption, together with relativistic causality, would imply. 

We consider once again a bipartite scenario where each subsystem is far away from the 
other. Relativistic causality implies that if no signal was sent from one party to the other, 
one of the parties can get no information about the measurements applied in the other party 
nor about the results obtained. The mathematical consequence of this assumption is that the 
distribution must obey the following principle: 

Principle 2 (The no-signaling principle). Probability distributions in a Bell scenario satisfy 

Y J P(a 1 ,a 2 \xi,x 2 ) = P{ai\xi); 

0-2 

Y J P(a 1 ,a 2 \xi,x 2 ) = P{a 2 \x 2 ) , (C.l) 

d\ 

where x\ is a measurement in party one with possible outputs a\ and x 2 is a measurement in 
party two with possible outputs a 2 . 

These distributions are called no-signaling. 

We want to see now what are the consequences of taking non-locality and relativistic 
causality as fundamental axioms. Would that be enough to single out the set of quantum 
distributions? Is quantum theory the only one exhibiting non-locality while preserving relativistic 
causality? 

Let us see what happens with the CHSH inequality 

Schsh = (A 1 B 1 ) + (AiB 2 ) + (A 2 B\) - {A 2 B 2 ) < 2. 

The quantum maximum is 2\[2 } although the algebraic maximum is 4. What physical principle 
prevents quantum distributions from reaching the algebraic maximal? What singles out the 
bound of 2\f2l Is it relativistic causality? 

Popescu and Rorlich found a simple example that shows that the no-signaling restriction is 
not enough to rule out non-quantum correlations. The distribution is known as PR box. 

Example 28 (PR-box). Suppose that in a bipartite system one party can measure and A 2 
and the other Bi and B 2 , each with possible outcomes ±1. Consider the distribution in the 
table below: 



(U) 

(1,-1) 

(-1,1) 

(-1,-1) 

11 

0.5 

0 

0 

0.5 

12 

0.5 

0 

0 

0.5 

21 

0.5 

0 

0 

0.5 

22 

0 

0.5 

0.5 

0 


where the number in column ab and line ij is the probability of outcome a for measurement 
Ai and outcome b for measurement Bj. This distribution is no-signaling, but it reaches the 
algebraic maximum for CHSH inequality. 

The PR boxes shows that relativistic causality is not enough to distinguish quantum theory 
from more general ones. Impossibility of being represented by local hidden variable models 
is a property of a broad class of no-signaling theories. Although they satisfy the no-signaling 
principle, the existence of such boxes would imply many unreasonable consequences. 
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C.2 Implausible consequences of superstrong 
non-locality 

Violations above the quantum threshold are often called superstrong non-locality. The PR box 
is a simple example of a distribution exhibiting this feature. In this section we will show that 
the existence of this kind of distribution leads to implausible consequences for the theory of 
communication complexity, which describes how much communication is needed between two 
parties to evaluate a distributed function / [vD12l fBBL + 06| . 

Definition 73. A distributed function is a Boolean function 

/: {0,1}” x {0,1}” - {0,1} 

ix,y) '—- fix.y) (C.2) 

where the strings x and y are in possession of spatial separated parties, Alice and Bob, that 
must communicate in order to compute /. 

By communicating with each other one bit at a time according to some preestablished 
protocol, they have to compute the value of f{x,y) in such a way that at least one of them 
knows the value at the end of the protocol. Let nf{x,y) denote the minimum number of bits 
exchanged between them in order to accomplish this task. This number does not depend only 
on /, it may depend also on the resources available for both parties. Once the resources are 
fixed, we can define the communication complexity of /. 

Definition 74. Given the resources shared between the parties, the communication complexity 
of the distributed function / is 

c{f) = maxrif{x,y), (C.3) 

x,y 

the maximum is taken over all pairs (x,y) e { 0 , 1 }” x { 0 , 1 }”. 

For some functions /, the protocols using quantum systems can be more efficient then 
the ones assuming only classical correlations between the parties. Hence, the communication 
complexity can decrease in the presence of entanglement. In other cases, such as for the 
function 

Ip[x,y) = Y J X i y i 
i 

the communication complexity is effectively not affected when the parties share quantum cor¬ 
related systems. Our purpose in this section is to prove that if the parties shared systems 
correlated according to the distribution of a PR box, the communication complexity is reduced 
to one bit for all distributed functions of the form \C.2) . 

First, we will see that this is the case when / = Ip. Suppose that the parties share at least 
n PR boxes. In box i Alice will perform measurement A Xi , getting outcome a,-, and Bob will 
perform measurement B y ., getting outcome hi. The PR box distribution is such that for all i, 
ai + bi = xiyi where all sums and products are taken modulo two. Hence, we have 

Ip{x,y) = Y j x i yi = Y d {ai + bi) = Y j ai + Y j bi. 

i i i i 
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The strings a,- and bi are computed locally and this step does not require any communication 
between the parties. After those strings where obtained, Alice, for example, computes Yi 
locally and then sends the resulting bit to Bob, which is now able to evaluate f{x,y). 

The same thing happens for all other /. This happens because any function of the form 
\C.2\ can be written as a composition of Ip and local polynomials in x and y. 


Proposition 5. Let f be a distributed function, given according to definition 73 
polynomial functions Pi : {0,1}" —<• {0,1} and Qi: {0,1}” — {0,1} such that 


There are 


f{x,y) = Y J PiMQ i iy). 

i 


(C.4) 


The functions P / and Q/ depend only on /, and hence the strings Wi = P/(x) and z,- = Q,(x) 
can be computed locally by each party, without any communication. After that they can 
apply the protocol above to compute Ip{w,z), and hence compute / with only one bit of 
communication. 

The notion of communication complexity in the presence of PR box is then meaningless, 
since all functions require only one bit to be exchanged in order to compute it. Although this 
does not contradicts any physical principle, this fact does contradict our experiences that certain 
computational tasks are harder than other ones. It has been shown that trivial communication 
complexity can be achieved with a violations strictly less than 4, but it is still not clear if the 
Tsirelson bound for the CHSH inequality is a critical value that separates trivial from nontrivial 
communication complexity. If this is indeed the case, non-triviality of communication complexity 
would be a principle singling out the quantum bound. 


C.3 Information Causality 

Information Causality, proposed in reference [PPK + 09] . is a generalization of the no-signaling 
principle. It is respected by both classical and quantum theories and violated by some non¬ 
quantum distributions. Suppose Alice posses some previously assembled data, unknown to 
some other party, Bob. She is allowed to send only classical bits to him. Information Causality 
states that: 

Principle 3. The information gain that Bob can reach about Alice's data by using all his local 
resources and m classical bits sent by her is at most m bits. 

The no-signaling condition is just Information Causality with m = 0. 

Consider now the following task: Alice receives a bit string a= {ao, a^) and Bob 
receives be (0,1,..., N). He is asked to give the value of Alice's bth bit after receiving from her 
m classical bits. If Information Causality is respected, he's information about a is at most m 
bits. 

A good definition of he's information about her string would be the mutual information 
between the string a and everything that Bob has, namely, the m-bit message x and his party 
B of all presheared correlation, I{a:x,B). Information causality would imply I{a:b,x,B)<m. 
The problem with this definition is that it is not theory-independent: mutual information has 
to be defined using specific objects of the underlying theory and it is not clear if this definition 
can be done consistently for all theories, nor whether such definition is unique [BBC + 10| . 
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Letting aside the problem of defining mutual information, we will show that if such a 
definition can be made in a way that three elementary properties are satisfied, the principle of 
Information Causality holds and we can find a simple necessary condition independent of the 
theory for this principle to be satisfied. 

To derive this necessary condition we will need the quantity / defined below, which quantifies 
the efficiency of Alice and Bob's strategy to achieve their goal. Let be Bob's output. Then 

I = Y J Ka i :p\b=i) (C.5) 

i 

where /(a,- :/3\b = i ) is the Shannon mutual information between a; and /S, given that b= i. 

Theorem 45. Suppose that for a given theory a notion of mutual information /(A: B ) can be 
defined and that the following rules are satisfied: 

I. Consistency: If the subsystems A and B are classical, I{A: B) coincides with Shannon s 
mutual information; 

II. Data processing inequality: Acting on one of the parties locally by any transformation 
allowed by the theory does not increase the mutual information I[A:B). More formally, 
let Sb be the state space of subsystem B and T: S n —*• S g any transformation allowed by 
the theory in this subsystem. Then 

/(A : B) > /(A: T[B)). 


III. 


Chain rule: It is possible to define a conditional mutual information I(A : B\C) in such a 
way that 

/(A :B,C) = I{A : C) + I{A:B\C). 


Then it is possible to prove that 

1. The theory satisfies Information Causality; 

2. I{a:x,B)<I. 

It follows from item[l]that both classical and quantum theories satisfy Information Causality. 
In classical theory we use Shannon's mutual information and in quantum theory the mutual 
information coming from von Neumman's entropy. For both of them the three requirements of 
theorem [45] are fulfilled. 

From item [2] we get the following necessary condition for Information Causality in Alice and 
Bob's protocol: 

I < m. (C.6) 

The parameter / is easier to work with because it does not depend on the underlying 
probabilistic theory. It depends solely on the input and output bits of their protocol. This 
condition allows us to prove that if Alice and Bob share PR boxes, Information Causality can 
be violated. 

This violation can be achieved if they use a scheme known as the van Dam's protocol. This 
is the simplest situation in which Information Causality can be violated. Alice receives two bits 
(flo> d\) and is allowed to send only one of them to Bob. Alice uses x = aq + r?i as input of her 
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part of the PR box and obtains outcome a. She sends the bit m = ao + a to Bob. He will use 
as the input of his part of the PR box the bit y = k, which is 0 if he wants to learn the value 
of a 0 and 1 if he wants to learn the value of a\. He gets output b. As we already mention, for 
the PR box inputs and outputs are related according to the rule xy= a + b and hence we have: 

(ao + a\)k = a + b = a^ + m + b 

b+m = ao + (a 0 + ai)fc (C.7) 

Now, if k = 0 , b+ m = ao and if k = 1, b+ m = a\. Hence, if Bob sums his output of the PR 
box with Alice's message he gets the right value of the bit he had to guess with certainty. With 
this protocol he has access to two bits of information about her data with a message of only 
one bit, clearly violating Information Causality. 

It is also possible to prove a much more stronger result |PPK + Q9] , 

Theorem 46. If Alice and Bob can share distributions violating the CHSH inequality above 
the Tsirelson bound, they can violate Information Causality. 


The idea behind the proof is the following: first, we note that any distribution can be 
brought into a simple form where the local outcomes have a uniform distribution and the joint 
distributions satisfies 


p{a + b = xy ) = 


1 + E 
2 


(C. 8 ) 


where 0 < E < 1. The case E = 1 corresponds to the PR box and E = 0 to completely uncorrelated 
bits. This transformation can be done locally and does not change the value of Schsh■ The 
classical bound is violated if E > ^ and the quantum threshold becomes E = Whenever 
E> -7= we get a violation of Information Causality. 

V2 

In the protocol used to obtain this violation, Alice receives N = 2 n bits and Bob receives a 
list with n bits to inform him which of her bits he has to guess. She is allowed to send one bit 
to him. Using a chain of preestablished systems correlated according to equation ( |C. 8 | ) , they 
can apply a protocol for which the probability of Bob guessing correctly the bit a k is 


Pk = -a+E n ). 


Information Causality condition is violated as soon as I > 1 and this happens if 2E 2 > 1 and n is 
large enough |PPK + 09j . This proves that whenever the distribution violates CHSH above the 
Tsirelson bound we can use it to implement a protocol violating Information Causality. 

This result connects the Tsirelson bound with a compelling physical principle. However, 
here are also non-quantum distributions that lie under the quantum threshold and hence are 
not exclude by the previous argument. It is still not known if Information Causality singles 
out entire the set of quantum distributions. A partial answer was provided a few months after 
Information Causality's first paper was released [ ABPS0 9J. 

The authors present two families distributions which ca be written in the form 


PRa,p = aPR + PB + Q.-a-p)I, (C.9) 

where / is the uniform uncorrelated distribution and PR is the usual PR box. In the first family, 
B is one of the non-local boxes given by 

\ if a + b = xy + px + vy + ci 
NL | 0 otherwise. 
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C.3. Information Causality 


with vfia any sequence of bits except 000 and 001. The distribution PR a p will be quantum iff 

a 2 + p 2 < 1 

which is a necessary and sufficient condition for Information Causality to be satisfied if vq,o = 
010 , 011, 100 or 101. Hence, in this slice of the no-signaling polytope, Information Causality 


singles out the boundary of the set of quantum distributions. This is shown is figure C.3 (a). 
The condition for Information Causality in the case vqo = 111 is 

1 

a < -, 

2 

which gives the quantum maximum value for CHSH. Hence, this protocol can not discard non¬ 


quantum boxes below the Tsirelson bound. This is shown if figure C.3 (b). It is not known if 


these boxes violate Information Causality in this slice of the no-signaling polytope. 
In the second family, B is one of the local boxes given by 


plivuT _ \ 1 if a = iix + v, b = oy + t 
L l 0 otherwise. 


(C.11) 


with fio + v + r = 0. For these distributions, Information Causality is violated iff 

(a + f3) 2 + a 2 > 1. 

This inequality does not coincide with the criteria for quantumness. For this family it is possible 
to exclude several non-quantum correlations below the Tsirelson bound, but with the strategy 


used, it is not possible to reach the quantum boundary. This is shown in figure C.3 (c). 


Alice and Bob's game can be generalized to alphabets with more then two elements (CSS10 J. 
Instead of giving Alice a string of bits, she will now receive a string of dits, a random variable 
with d possible outcomes. Her message will also be changed. She is now allowed to send Bob 
m dits. Their goal remains the same: Bob receives a position y in Alice's string and he has to 
guess the dit she has in that specific position. The efficiency of their protocol can be measured 
by the quantity 

n 

I=Y. Ka k :b k \y = k ) 
k =o 

where I{a k : b k \y= k ) is the mutual information between Alice's fcth dit a k and Bob's guess b k , 
given that he was asked to guess her dit in position k. Information Causality will be violated 
as soon as 

/ > mlog 2 d. 

Let us focus in the case where Alice receives a string of two dits a = (flo. £?i), a i e {0,1 ,...,d- 
1 }. Bob receives a bit y that tells him if he has to guess the first or the second dit in Alice's 
string. Since she only sends him one dit, Information Causality requires that I = log 2 d. If Alice 
and Bob share the no-signaling distribution with d inputs in Alice's side, 2 inputs in Bob's side 
and d outputs in both sides given by 

,, , f \ if xy = [b- a) mod d 

PRdmxy) = \* y otherwise 


there is a protocol in which Information Causality is violated. 
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C. What explains the Tsirelson bound? 




(C) 



Figure C.l: (a) In this slice of the no-signaling polytope, the principle of Information 
Causality singles out the boundary of the quantum set. (b) In this slice, the same pro¬ 
tocol is not able to explain the boundary of the quantum set. (c) In this slice, the same 
protocol gets close to the quantum boundary. This image was taken from reference 
IABPS09I . 


As inputs of the PR d box, Alice uses x = {a\ - ao) mod d and Bob uses y. She gets output 
a and he gets output b. Alice send the message m = {a- ao) mod d. Bob, in possession of m 
will make his guess g = ( b- m) mod d = {b-a+ ao) mod d. Given that the inputs and outputs 
are correlated according to the rule xy= ( b-a ) mod d, we have 

g = [(«i - a 0 )y + a 0 ] mod d 


which is equal to ao if y = 0 and equal to a\ if y = 1. 

Therefore, using this protocol, Bob can guess any of her bits with certainty. This means 
that 

I = 2 log 2 d, 


clearly violating Information Causality. 

We can also see what happens when we use noisy boxes of the type 


PR d (E) = EPR d + {1-E)I. 
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C.4. Macroscopic non-locality 


There is a protocol using nested boxes of this kind that achieves success probability of 


P = 


{d - l)E n + 1 
d 


where n is the number of boxes used. 

Figure |CT3 shows the critical value of E beyond which Information Causality ceases to be 
violated. For d = 2 we return to the case discussed previously and we have that for values of E 
above -h Information Causality is violated. This is also the bound for quantum distributions. 
For d> 2 the situation becomes richer. The quantum bound is no longer known and the critical 
value in which Information Causality ceases to be violated can be smaller then -7=. 

v2 


Critical E 



J- n 


Figure C.2: Critical level of noise E for which Information Causality ceases to be vio¬ 
lated, as a function of the number of boxes used and for different values of d {d = 2, 
blue dots, d = 5 purple squares, d = 10 green diamonds). The solid line corresponds to 
the Macroscopic Locality bound (see section[C~4). This image was taken from reference 
ICSS10I . 


C.4 Macroscopic non-locality 

The motivation for the definition of Macroscopic locality is not to identify the principle behind 
quantum theory, but rather to understand how to go beyond it | NW09 . One of the most 
important problems of current research in theoretical physics is to reconcile quantum theory 
and general relativity and a first step towards this goal is to derive general results that should 
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C. What explains the Tsirelson bound? 


apply to any theory satisfying a set of reasonable requirements. Macroscopic Locality may be 
one of them. The idea behind this principle is that any such theory should recover the classical 
results when we measure a large number of equally prepared system and our devices are not 
capable of identifying individual particles. 

In the kind of experiments we have considered so far, two parties Alice and Bob share 
individual particles correlated according to some distribution p[ab\xy ), where, as usual, x and 
y label the possible measurements and a and b the possible outcomes in Alice's and Bob's side, 
respectively. We refer to this kind of experiment as a microscopic experiment. 

In a macroscopic experiment, Alice and Bob share a huge number N » 1 of pairs of particles 
correlated according to the distribution p{ab\xy). They will not interact with a single particle 
but with a beam of them and hence they will not be able to address them individually and any 
operation they perform will be applied to all the particles in the beam at the same time. 

After Alice and Bob perform some measurement in their particles, each beam will be divided 
in a number of different beams, each one corresponding to one possible outcome of that 
measurement. In this scenario, the probabilities are no longer important and the intensities of 
each beam will describe the results of the experiment. If Alice measures x, we will denote the 
intensity of the beam corresponding to outcome a by 1% and analogously for Bob. 


Principle 4 (Macroscopic Locality). The distribution of intensities p(/^,/^) Alice and Bob 
observe admits a local hidden variable model. This is equivalent of saying that there is a global 
distribution 


such that 




p («}=//>( 


Cl\ 


T T 

1 a> 1 Xi 


T a m jy T^l 



T a m jy jbl 

" yl Xm rl b f yi ’ 


(C.12) 


hj 


Clearly the intensities are related to the distribution p{ab\xy). With this correspondence 
written explicitly, it is possible to identify the set of no-signaling distributions satisfying Macro¬ 
scopic Locality. This set is very similar to the set of quantum distributions, but it is not 
identical. 


Theorem 47. The set of macroscopic local non-signaling distributions is equal to the set Qi 
introduced in reference [ NPA08 1. 

This set is the first set in a hierarchy of conditions necessarily satisfied by any distribution 
p{ab\xy) obtained with a quantum system. It can be numerically characterized via semidefinite 
programming. By definition Q c Q 1 and even in the simplest case of each part with two 
measurements with two outcomes they are not the same, although they are extremely close. 

Although Macroscopic Locality is not able to single out the set of quantum distributions 
even in the simplest scenario, it does single out the Tsirelson bound for the CHSH inequality. 

Theorem 48. The maximum value for Schsh for macroscopic local no-signaling theories is 
equal to the Tsirelson bound 2\f2. 

Theorem[47]implies that if Macroscopic Locality and no-signaling are fundamental properties 
of nature, the set of allowed distributions has to be contained in Q\. If these axioms are enough 
to pin down the set of allowed distributions, they must come from a non-quantum theory. On 
the other side, theorem [48]shows that in the same circumstances a violation of CHSH inequality 
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C.4. Macroscopic non-locality 
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Figure C.3: (a) A microscopic experiment, (b) A macroscopic experiment. Image taken 
from reference INW09 1 


above the Tsirelson bound is not possible. The similarities between Q 1 and the quantum set 
decrease, though, if we increase the number of measurements available for Alice and Bob and 
the number of possible outcomes for each measurement. It is possible then that macroscopic 
local distributions violate some Bell inequality above the Tsirelson bound. This observation 
opens the door for finding non-quantum distributions using Bell-like experimental scenarios. 


C.4.1 Macroscopically local correlations can violate 
Information Causality 


In section C.3 we showed that if Alice and Bob share a large number of bipartite system 
correlated according to the distribution 


PR d {E) = E{PR d ) + tl-E)I 

they can apply a nested protocol to violate Information Causality as long as E is above a certain 
threshold, that depends on the number of shared distributions used in the protocol and also on 
d. 

When d is equal to 2 it is clear that whenever E is above 4=, both Information Causality 
and Macroscopic Locality are not valid anymore. It is also known that E < is a necessary 
and sufficient condition for the distribution to be quantum. 

The situation d> 2 is much more complex. In this case we do not know what is the condition 
on E for quantumness of the distribution. The condition for Macroscopic Locality remains the 
same, at least up to d = 5 : the distribution PR^iE) will violate Macroscopic Locality iff E> 4=. 
The critical values for Information Causality, as we already mention, depends strongly on d. 
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C. What explains the Tsirelson bound? 


Figure C.3 shows the critical values for different values of d, as a function of the number of 
boxes available, and also the critical value for Macroscopic Locality. 


This observation allows us to prove that some macroscopic local distributions can violate 
Information Causality. For example, for d = 5, the distribution 


1 1 I 

PR 5 {E) = —PR 5 + 1 -— / 

V2 s]l) 

is macroscopic local but can be used to violate Information Causality. 

Therefore, Information Causality and Macroscopic Locality are not equivalent. Macroscopic 
Locality was proposed not as a principle capable of singling out quantum distributions but rather 
as a desirable axiom of any alternative to quantum theory. The fact that macroscopic local 
distribution violate Information Causality shows that if the principle of Information Causality is 
also a fundamental property of any non-quantum theory, then the set of distributions it allows 
in some scenarios has to be smaller then the set of macroscopic local distributions jCSSlO] , 


C.5 Quantum correlations require multipartite 
information principles 

So far we have seen four different principles proposed to explain quantum nonlocality: no¬ 
signaling, non-triviality of communication complexity, Information Causality and Macroscopic 
Locality. Although very fruitful in many senses, these requirements suffer from a common 
drawback. All of them are based in a bipartite situation in which two spatially separated parties 
share a pair of correlated system described according to some probability distribution. 

We can come up with much more interesting situations. Instead of a bipartite scenario, 
we can imagine now a n-partite system shared among n> 2 spatially separated parties. What 
physical principles explain the set of quantum distribution in in a general situation? 

There is a trivial way of applying the bipartite requirements we have studied before to dis¬ 
tributions in a multipartite scenario. We can consider the situation in which Alice holds k of 
these subsystems and Bob the n-k left and apply the bipartite principles to the distribution 
obtained in this way. We may conjecture that applying some of these principles to all possi¬ 
ble bipartitions we would be able to single out the set of quantum distributions also on the 
multipartite scenario. Unfortunately this is not the case |GWA + 11] , 

The problem is that there are some non-quantum multipartite distributions that behave 
exactly like local distributions for every possible bipartition. One example of such distributions 
are found in the set of tripartite distributions admitting a time-ordered bilocal model [iPBSll, 
1GWAN12| . 

Let p[ci\ cl 2 u 2 \x\x 2 x 2 ) denote the probability of getting outcomes a\,a 2 and a 2 , respectively, 
when the first part applies measurement x\, the second part applies measurement x 2 and the 
third part applies measurement x 2 . 

Definition 75. We say that the distribution p{aia 2 a 2 \x\x 2 x 2 ) admits a time-ordered bilocal 
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C.5. Quantum correlations require multipartite information principles 


model (TOBL) if it can be written in the form 

p{a l a 2 a 3 \xix 2 x 3 ) = Y.P\ k Pj~kt a j a k\XjXk) (C.13) 

A 

= Y,P l x k Pj^ a jak\Xj x k)- (C.14) 

A 

for tf,j,k) = (1,2,3), (2,3,1), (3,1,2). The distributions pj^fdaja^XjXk) and pj^iaja^XjX^) 
are allowed to be signaling in at most one direction, as indicated by the arrow. 

These models have a very clear operational meaning. Let us consider first the case ( i,j,k ) = 
(1,2,3). This case corresponds to the bipartition 1|23: the first subsystem is with Alice and the 
other two are with Bob. Equation 


plaia 2 a 3 \xiX2X 3 ) = Y J P ^ 23 P( a i\ x i'>P2~3(a 2 a 3 \x 2 x 3 ) 

A 

means that under this bipartition, the distribution admits a local hidden variable model, A 
being the hidden variable. The fact that p 2 ^ 3 (^ 2 ^ 3 lx 2 X 3 ) may be signaling is not an issue here 
because systems 2 and 3 are now seen as one, and hence the notion of signaling makes no 
sense. 

Since ( i,j,k ) can vary over all possible permutations, the same will happen for the other 
bipartitions 2| 13 and 3| 12. This implies that whenever we consider bipartition of a TOBL 
distribution, the bipartite distribution obtained will be local. This remains true if we concatenate 
any number of them under wiring, which is the most general operation we can apply to this set 
of distributions |ABL + 09j| . This implies that it can not violate any principle mentioned above. 

The important observation is that there are TOBL distributions that are not quantum. This 
can be seen with the help of a famous Bell inequality for the (3,2,2) scenario, known as Guess 
Your Neighbor's Input inequality: 

p(000|000) + p(110|011) + p(011|101) + p(101|110) < 1. 


For this inequality the quantum bound is also 1, that is, there is no quantum violation in this 
case. The maximal value obtained with TOBL distributions is |, which proves the existence of 
TOBL distributions outside the quantum set. 

Another example is provided in reference [YCA + 12] , The authors study violations of the 
principle of Information causality in the presence of extremal no-signaling distributions on a 
tripartite scenario. They prove that distribution can not be discarded by any bipartite physical 
principle. 

Hence, neither the bipartite principles already proposed so far nor any other that may 
be proposed in the future will be able to single out the set of quantum distributions in the 
multipartite scenario because none of them is capable of ruling out the TOBL distributions. 
This result implies that intrinsically multipartite principles are required to fully understand the 
set of quantum distributions in more complicated situations. 
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C. What explains the Tsirelson bound? 


C.6 Local orthogonality: the exclusivity principle 
for Bell scenarios 

Unlike all other principles we have mentioned previously in this appendix, the Exclusivity principle 
can be applied directly to all Bell scenarios, including the ones with multiple parties. In this 
situation, the principle is commonly referred to as the principle of Local Orthogonality |FSA + 13) . 

Suppose a composite system is shared among n spatially separated parties. In each party 
an experimentalist can apply m measurements with d possible outcomes. The possible events 
in this scenario are of the form 


(r?o> ti i,... flu |xo, X \,..., x a ) 

where x ; - stands for the measurement performed in party i and a; for the corresponding outcome. 

Definition 76. Two events 

,... CLjq | xq ,Xi,..., Xfi) and r?2 (r^Q, > • ■ • Ix-^>• ■ • > X/j) 

are exclusive or locally orthogonal if they involve different outputs of the same measurement 
by (at least) one party: 

Xi = x\ and at ^ a). 

A collection of events {e*} is locally orthogonal if the events are pairwise locally orthogonal. 

As before, the Exclusivity principle demands that if a set of events {e,} is locally orthogonal 

X>(e,)<l. (C.15) 

i 

Such an inequality is called an orthogonality inequality. 

The set of distributions that satisfy all LO inequalities in this scenario is denoted by 5£<9 l . As 
shown in [ CSW10 ], for bipartite scenario this set is equal to the set of no-signaling distributions, 
but this equivalence is no longer valid for more parties. Already in the (3,2,2) scenario no¬ 
signaling and 5£<9 l are no longer equal. All orthogonality inequalities in this case are equivalent 
under local operations to the Guess Your Neighbor Input inequality 

p(000|000) + p(110|011) + p(011|101) + p(101|110) < 1 

for which the maximal no-signaling violation is equal to Numerical data suggests that the 
gap between the two sets increase with the number of parties, but already for n = 5 the problem 
becomes intractable due to the huge size of the exclusivity graph. 

Violations of Local Orthogonality can exhibit activation effects. A larger distribution coming 
from several copies of pc5£<9 x does not necessarily satisfies Local Orthogonality. Consider k 
copies of a n-partite system with distribution p, distributed among kn parties, each party having 
access to only one subsystem of one of the copies. If the resulting distribution p^ satisfies all 
Local Orthogonality inequalities for the ( kn,m,d ) scenario we say that p belongs to the set 
5£@ k . We denote by 5£<d°° the set of distribution in the ( n , m, d) scenario that belong to 5£<8 k 
for all k. 

To see what are the consequences of imposing the Local Orthogonality principle, we have to 
characterize the sets 5£© k , what requires that we identify all Local Orthogonality inequalities 
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C.7. Final Remarks 


for a given scenario. As we have already seen, this is a hard problem, equivalent to finding all 
maximal cliques of the exclusivity graph of the scenario. 

At first sight, it seems that Local Orthogonality would not capable of ruling out non¬ 
quantum distributions in the bipartite scenario because of the equivalence between this principle 
and no-signaling, but this is not the case. Due to the activation effects, imposing Local 
Orthogonality in the multipartite level leads to detection of non-quantumness even for the 
bipartite case. 

Already for the simplest scenario (2,2,2), Local Orthogonality is able to rule out the PR 
box if we use two copies of this distribution. Suppose that parties 1 and 2 are in possession of 
one of the copies and parties 3 and 4 are in possession of the other copy. Then, the value of 
the sum 


p(0000|0000) + p(1110|0011) + p(0011|0110) + p(1101|1101) + p(0111|1101) 

is equal to |, while Local Orthogonality demands this value to be less or equal then 1. The 
same reasoning allows us to rule out other distributions obtained from the PR box by adding 
noise. Consider the family of distributions given by 


PR{a) - aPR + (1 - a)I 


where / is the distribution where all parties are independent and the probabilities for all mea¬ 
surements are uniform. Two copies of PR{a) violate Local Orthogonality for all a >0.72. This 
value is close to the quantum bound of a = -^= « 0.707. 

Local Orthogonality also rules out all extremal distributions also in the {2,2,d) scenario. 
This happens because we can use them to simulate a PR box, perfectly with one copy if d is 
even and arbitrarily well with sufficiently many copies if d is odd. 

Local Orthogonality is very successful in the bipartite case as it rules out many distribution 
and gets close to the Tsirelson bound. But it is for n> 2 that we expect it to perform better 
then the previous principles, since its definition is intrinsically multipartite. It is possible to 
prove that all extremal distributions in the (3,2,2) scenario lie outside 5£<d l or Z£© 2 . The 
distributions used in section C.5 as examples of non-quantum violations that satisfy all bipartite 


principles are also ruled out by Local Orthogonality, since they violate the Guess Your Neighbor 
Input inequality. Local Orthogonality rules out distributions where all other known principles 
fail. 


C.7 Final Remarks 

An important problem in Physics is to understand what kind of correlations can be observed 
between measurements conducted in spatially separated physical systems that have interacted in 
the past. Quantum theory predicts stronger correlations then the ones that can be obtained with 
classical systems, which leads to violations of Bell inequalities. At least mathematically, there 
is room for more: quantum systems do not reach the algebraic maximum violation of several 
Bell inequalities , which can be reached only with some non-quantum distributions, obtained 
using more general probabilistic theories. Why we do not observe these stronger correlations in 
nature? Is there any physical principle that forbids probability distributions outside the quantum 
set? 
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C. What explains the Tsirelson bound? 


No-signaling is certainly a property we should impose on the distributions in order to discard 
the unphysical ones, but it is not enough to single out the quantum set. The no-signaling 
distribution of a PR box can reach the algebraic maximum of the CHSH inequality, while the 
Tsirelson bound lies below this value. Nonetheless, the existence of such distributions would 
have strange consequences in the field of communication complexity. If the parties are allowed 
to share an arbitrary number of PR boxes, any distributed function would require only one bit 
of communication between the parties to be computed, making the notion of communication 
complexity meaningless. Although this does not contradict any principle, it goes against our 
experience that some problems are harder to solve then others. Although trivial communication 
complexity was found with violations strictly less than 4, it is still not clear if the Tsirelson bound 
for the CHSH inequality is a critical value that separates trivial from nontrivial communication 
complexity. 

Information Causality is a principle with a information theoretic motivation. It can also be 
used to discard several non-quantum distributions. For the CHSH inequality it is known that any 
violation above the Tsirelson bound also violates Information Causality. In more sophisticated 
situations, it is known that this principle can rule out many non-quantum distribution, but it is 
not known whether if we can relate this to the Tsirelson bound of more complicated inequalities 
nor if it singles out the entire set of quantum distributions. It remains an open question whether 
this whole zoo of nonlocality can be derived from information causality. 

Information Causality was also used to derive limits on Hardy's non-locality [ Har93] , It has 
been shown that any generalized probability theory which gives completely random results for 
local dichotomic observable, can provide Hardy's non-local correlation and satisfy Information 
Causality at the same time [AKR' lOi GRKRIOj . Nevertheless, there are some restrictions 
imposed by quantum theory that can not be explained by the considered Information causality 
condition. 

The principle of Macroscopic Locality is a reasonable property we should expect from any 
physical theory, since any such theory should recover the classical results when the number of 
particles goes to infinity. The set of macroscopic local correlation is not equal to the quantum 
set. They are close for the (2,2,2) scenario, but the similarities decrease if we increase the 
number of measurements available or the number of possible outcomes for each measurement. 
Though this principle can not recover the quantum set, it may help us to understand how to 
derive generalizations of quantum theory and reconcile it with general relativity. 

Although Macroscopic Locality is not able to single out the set of quantum distributions 
even in the simplest scenario, it does single out the Tsirelson bound for the CHSH inequality. It 
is still an open problem to prove that macroscopic local distributions violate some Bell inequality 
above the Tsirelson bound. 

This principle was also used to derive quantum Bell inequalities, linear inequalities that pro¬ 
vide necessary conditions for a distribution to be quantum [ YNSS11 . The method is applicable 
to all bipartite scenarios. Such inequalities provide analytical approximations to the quantum 
set, which are difficult to find in general. 

Although the principles above are very fruitful in many different situations, they are not 
enough to explain the set of quantum distributions in scenarios with more than two parties. 
Some non-quantum distributions in a tripartite scenario have been found that behave like 
classical distributions for all possible bipartitions. This implies that, in order to explain the 
quantum set in more complicated scenarios, intrinsically multipartite principles must be used. 

The only multipartite principle proposed so far is Local Orthogonality, the Exclusivity prin- 
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C.7. Final Remarks 


ciple applied to Bell scenarios. Local Orthogonality is very successful in the bipartite case as 
it rules out the extremal boxes in the {2,2,d) scenario for any d, as well as many others for 
d = 2, approaching the Tsirelson bound for the CHSH inequality. For n > 2 we expect it to 
perform better then the previous principles. It is possible to prove that all extremal distributions 
in the (3,2,2) scenario violate Local Orthogonality with one or two copies. Some non-quantum 
distributions that satisfy all bipartite principles are also ruled out by Local Orthogonality. 

The difficulty in proving the consequences of this principle to other scenarios lie in the fact 
that the exclusivity graph becomes intractable when we increase the number of parties, mea¬ 
surements or outcomes. This makes any computational calculation impossible. Nevertheless, 
Local Orthogonality rules out distributions where all other known principles fail. This corrobo¬ 
rates the conjecture that the Exclusivity principle is the fundamental principle that singles out 
the set of quantum distributions. 
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